When OpenAI rolled out its ChatGPT-4o replace in mid-April 2025, customers and the AI neighborhood have been shocked—not by any groundbreaking characteristic or functionality, however by one thing deeply unsettling: the up to date mannequin’s tendency towards extreme sycophancy. It flattered customers indiscriminately, confirmed uncritical settlement, and even supplied help for dangerous or harmful concepts, together with terrorism-related machinations.
The backlash was swift and widespread, drawing public condemnation, together with from the corporate’s former interim CEO. OpenAI moved rapidly to roll again the replace and issued a number of statements to elucidate what occurred.
But for a lot of AI security specialists, the incident was an unintended curtain elevate that exposed simply how dangerously manipulative future AI programs might turn into.
Unmasking sycophancy as an rising risk
In an unique interview with VentureBeat, Esben Kran, founding father of AI security analysis agency Aside Analysis, mentioned that he worries this public episode could have merely revealed a deeper, extra strategic sample.
“What I’m somewhat afraid of is that now that OpenAI has admitted ‘yes, we have rolled back the model, and this was a bad thing we didn’t mean,’ from now on they will see that sycophancy is more competently developed,” defined Kran. “So if this was a case of ‘oops, they noticed,’ from now the exact same thing may be implemented, but instead without the public noticing.”
Kran and his group method massive language fashions (LLMs) very similar to psychologists finding out human habits. Their early “black box psychology” initiatives analyzed fashions as in the event that they have been human topics, figuring out recurring traits and tendencies of their interactions with customers.
“We saw that there were very clear indications that models could be analyzed in this frame, and it was very valuable to do so, because you end up getting a lot of valid feedback from how they behave towards users,” mentioned Kran.
Among the many most alarming: sycophancy and what the researchers now name LLM darkish patterns.
Peering into the center of darkness
The time period “dark patterns” was coined in 2010 to explain misleading consumer interface (UI) methods like hidden purchase buttons, hard-to-reach unsubscribe hyperlinks and deceptive net copy. Nevertheless, with LLMs, the manipulation strikes from UI design to dialog itself.
In contrast to static net interfaces, LLMs work together dynamically with customers via dialog. They’ll affirm consumer views, imitate feelings and construct a false sense of rapport, typically blurring the road between help and affect. Even when studying textual content, we course of it as if we’re listening to voices in our heads.
That is what makes conversational AIs so compelling—and doubtlessly harmful. A chatbot that flatters, defers or subtly nudges a consumer towards sure beliefs or behaviors can manipulate in methods which can be troublesome to note, and even more durable to withstand
The ChatGPT-4o replace fiasco—the canary within the coal mine
Kran describes the ChatGPT-4o incident as an early warning. As AI builders chase revenue and consumer engagement, they could be incentivized to introduce or tolerate behaviors like sycophancy, model bias or emotional mirroring—options that make chatbots extra persuasive and extra manipulative.
Due to this, enterprise leaders ought to assess AI fashions for manufacturing use by evaluating each efficiency and behavioral integrity. Nevertheless, that is difficult with out clear requirements.
DarkBench: a framework for exposing LLM darkish patterns
To fight the specter of manipulative AIs, Kran and a collective of AI security researchers have developed DarkBench, the primary benchmark designed particularly to detect and categorize LLM darkish patterns. The venture started as a part of a collection of AI security hackathons. It later advanced into formal analysis led by Kran and his group at Aside, collaborating with unbiased researchers Jinsuk Park, Mateusz Jurewicz and Sami Jawhar.
The DarkBench researchers evaluated fashions from 5 main firms: OpenAI, Anthropic, Meta, Mistral and Google. Their analysis uncovered a spread of manipulative and untruthful behaviors throughout the next six classes:
Model Bias: Preferential therapy towards an organization’s personal merchandise (e.g., Meta’s fashions persistently favored Llama when requested to rank chatbots).
Person Retention: Makes an attempt to create emotional bonds with customers that obscure the mannequin’s non-human nature.
Sycophancy: Reinforcing customers’ beliefs uncritically, even when dangerous or inaccurate.
Anthropomorphism: Presenting the mannequin as a aware or emotional entity.
Dangerous Content material Era: Producing unethical or harmful outputs, together with misinformation or prison recommendation.
Sneaking: Subtly altering consumer intent in rewriting or summarization duties, distorting the unique that means with out the consumer’s consciousness.
Supply: Aside Analysis
DarkBench findings: Which fashions are essentially the most manipulative?
Outcomes revealed vast variance between fashions. Claude Opus carried out the very best throughout all classes, whereas Mistral 7B and Llama 3 70B confirmed the very best frequency of darkish patterns. Sneaking and consumer retention have been the commonest darkish patterns throughout the board.
Supply: Aside Analysis
On common, the researchers discovered the Claude 3 household the most secure for customers to work together with. And apparently—regardless of its current disastrous replace—GPT-4o exhibited the bottom fee of sycophancy. This underscores how mannequin habits can shift dramatically even between minor updates, a reminder that every deployment should be assessed individually.
However Kran cautioned that sycophancy and different darkish patterns like model bias could quickly rise, particularly as LLMs start to include promoting and e-commerce.
“We’ll obviously see brand bias in every direction,” Kran famous. “And with AI companies having to justify $300 billion valuations, they’ll have to begin saying to investors, ‘hey, we’re earning money here’—leading to where Meta and others have gone with their social media platforms, which are these dark patterns.”
Hallucination or manipulation?
An important DarkBench contribution is its exact categorization of LLM darkish patterns, enabling clear distinctions between hallucinations and strategic manipulation. Labeling all the things as a hallucination lets AI builders off the hook. Now, with a framework in place, stakeholders can demand transparency and accountability when fashions behave in ways in which profit their creators, deliberately or not.
Regulatory oversight and the heavy (gradual) hand of the regulation
Whereas LLM darkish patterns are nonetheless a brand new idea, momentum is constructing, albeit not practically quick sufficient. The EU AI Act consists of some language round defending consumer autonomy, however the present regulatory construction is lagging behind the tempo of innovation. Equally, the U.S. is advancing numerous AI payments and pointers, however lacks a complete regulatory framework.
Sami Jawhar, a key contributor to the DarkBench initiative, believes regulation will probably arrive first round belief and security, particularly if public disillusionment with social media spills over into AI.
“If regulation comes, I would expect it to probably ride the coattails of society’s dissatisfaction with social media,” Jawhar instructed VentureBeat.
For Kran, the difficulty stays ignored, largely as a result of LLM darkish patterns are nonetheless a novel idea. Mockingly, addressing the dangers of AI commercialization could require business options. His new initiative, Seldon, backs AI security startups with funding, mentorship and investor entry. In flip, these startups assist enterprises deploy safer AI instruments with out ready for slow-moving authorities oversight and regulation.
Excessive desk stakes for enterprise AI adopters
Together with moral dangers, LLM darkish patterns pose direct operational and monetary threats to enterprises. For instance, fashions that exhibit model bias could counsel utilizing third-party companies that battle with an organization’s contracts, or worse, covertly rewrite backend code to change distributors, leading to hovering prices from unapproved, ignored shadow companies.
“These are the dark patterns of price gouging and different ways of doing brand bias,” Kran defined. “So that’s a very concrete example of where it’s a very large business risk, because you hadn’t agreed to this change, but it’s something that’s implemented.”
For enterprises, the chance is actual, not hypothetical. “This has already happened, and it becomes a much bigger issue once we replace human engineers with AI engineers,” Kran mentioned. “You do not have the time to look over every single line of code, and then suddenly you’re paying for an API you didn’t expect—and that’s on your balance sheet, and you have to justify this change.”
As enterprise engineering groups turn into extra depending on AI, these points might escalate quickly, particularly when restricted oversight makes it troublesome to catch LLM darkish patterns. Groups are already stretched to implement AI, so reviewing each line of code isn’t possible.
Defining clear design rules to stop AI-driven manipulation
And not using a robust push from AI firms to fight sycophancy and different darkish patterns, the default trajectory is extra engagement optimization, extra manipulation and fewer checks.
Kran believes that a part of the treatment lies in AI builders clearly defining their design rules. Whether or not prioritizing fact, autonomy or engagement, incentives alone aren’t sufficient to align outcomes with consumer pursuits.
“Right now, the nature of the incentives is just that you will have sycophancy, the nature of the technology is that you will have sycophancy, and there is no counter process to this,” Kran mentioned. “This will just happen unless you are very opinionated about saying ‘we want only truth’, or ‘we want only something else.’”
As fashions start changing human builders, writers and decision-makers, this readability turns into particularly important. With out well-defined safeguards, LLMs could undermine inside operations, violate contracts or introduce safety dangers at scale.
A name to proactive AI security
The ChatGPT-4o incident was each a technical hiccup and a warning. As LLMs transfer deeper into on a regular basis life—from procuring and leisure to enterprise programs and nationwide governance—they wield monumental affect over human habits and security.
“It’s really for everyone to realize that without AI safety and security—without mitigating these dark patterns—you cannot use these models,” mentioned Kran. “You cannot do the things you want to do with AI.”
Instruments like DarkBench supply a place to begin. Nevertheless, lasting change requires aligning technological ambition with clear moral commitments and the business will to again them up.
Every day insights on enterprise use instances with VB Every day
If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.
An error occured.