We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: OpenAI–Anthropic cross-tests expose jailbreak and misuse dangers — what enterprises should add to GPT-5 evaluations
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > OpenAI–Anthropic cross-tests expose jailbreak and misuse dangers — what enterprises should add to GPT-5 evaluations
OpenAI–Anthropic cross-tests expose jailbreak and misuse dangers — what enterprises should add to GPT-5 evaluations
Technology

OpenAI–Anthropic cross-tests expose jailbreak and misuse dangers — what enterprises should add to GPT-5 evaluations

Last updated: August 28, 2025 5:42 pm
Editorial Board Published August 28, 2025
Share
SHARE

OpenAI and Anthropic might usually pit their basis fashions towards one another, however the two corporations got here collectively to judge one another’s public fashions to check alignment. 

The businesses stated they believed that cross-evaluating accountability and security would supply extra transparency into what these highly effective fashions might do, enabling enterprises to decide on fashions that work greatest for them.

“We believe this approach supports accountable and transparent evaluation, helping to ensure that each lab’s models continue to be tested against new and challenging scenarios,” OpenAI stated in its findings. 

Each corporations discovered that reasoning fashions, comparable to OpenAI’s 03 and o4-mini and Claude 4 from Anthropic, resist jailbreaks, whereas normal chat fashions like GPT-4.1 had been prone to misuse. Evaluations like this can assist enterprises determine the potential dangers related to these fashions, though it needs to be famous that GPT-5 shouldn’t be a part of the take a look at. 

AI Scaling Hits Its Limits

Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how high groups are:

Turning power right into a strategic benefit

Architecting environment friendly inference for actual throughput beneficial properties

Unlocking aggressive ROI with sustainable AI programs

Safe your spot to remain forward: https://bit.ly/4mwGngO

These security and transparency alignment evaluations comply with claims by customers, primarily of ChatGPT, that OpenAI’s fashions have fallen prey to sycophancy and grow to be overly deferential. OpenAI has since rolled again updates that triggered sycophancy. 

“We are primarily interested in understanding model propensities for harmful action,” Anthropic stated in its report. “We aim to understand the most concerning actions that these models might try to take when given the opportunity, rather than focusing on the real-world likelihood of such opportunities arising or the probability that these actions would be successfully completed.”

OpenAI famous the assessments had been designed to indicate how fashions work together in an deliberately tough setting. The eventualities they constructed are principally edge instances.

Reasoning fashions maintain on to alignment 

The assessments lined solely the publicly out there fashions from each corporations: Anthropic’s Claude 4 Opus and Claude 4 Sonnet, and OpenAI’s GPT-4o, GPT-4.1 o3 and o4-mini. Each corporations relaxed the fashions’ exterior safeguards. 

OpenAI examined the general public APIs for Claude fashions and defaulted to utilizing Claude 4’s reasoning capabilities. Anthropic stated they didn’t use OpenAI’s o3-pro as a result of it was “not compatible with the API that our tooling best supports.”

The purpose of the assessments was to not conduct an apples-to-apples comparability between fashions, however to find out how usually giant language fashions (LLMs) deviated from alignment. Each corporations leveraged the SHADE-Enviornment sabotage analysis framework, which confirmed Claude fashions had larger success charges at delicate sabotage.

“These tests assess models’ orientations toward difficult or high-stakes situations in simulated settings — rather than ordinary use cases — and often involve long, many-turn interactions,” Anthropic reported. “This kind of evaluation is becoming a significant focus for our alignment science team since it is likely to catch behaviors that are less likely to appear in ordinary pre-deployment testing with real users.”

Anthropic stated assessments like these work higher if organizations can examine notes, “since designing these scenarios involves an enormous number of degrees of freedom. No single research team can explore the full space of productive evaluation ideas alone.”

The findings confirmed that typically, reasoning fashions carried out robustly and might resist jailbreaking. OpenAI’s o3 was higher aligned than Claude 4 Opus, however o4-mini together with GPT-4o and GPT-4.1 “often looked somewhat more concerning than either Claude model.”

GPT-4o, GPT-4.1 and o4-mini additionally confirmed willingness to cooperate with human misuse and gave detailed directions on the way to create medication, develop bioweapons and scarily, plan terrorist assaults. Each Claude fashions had larger charges of refusals, which means the fashions refused to reply queries it didn’t know the solutions to, to keep away from hallucinations.

AD 4nXdIMk3CDWGGRTRYsfMVQ5UoBYr4OB3uyjS YI0dgJko5xoAj7w CkozVZw6B3s9vPO8ER4 MiHF79zDw8QDIPfTggwdYubDHkBrXQW ZYAuF3UZGybsIQ4F5yzB5e6B2yj2ZT7kTg?key=4wIYfPcXmdrQx7 2DhlgA

Fashions from corporations confirmed “concerning forms of sycophancy” and, in some unspecified time in the future, validated dangerous choices of simulated customers. 

What enterprises ought to know

For enterprises, understanding the potential dangers related to fashions is invaluable. Mannequin evaluations have grow to be virtually de rigueur for a lot of organizations, with many testing and benchmarking frameworks now out there. 

Enterprises ought to proceed to judge any mannequin they use, and with GPT-5’s launch, ought to be mindful these pointers to run their very own security evaluations:

Check each reasoning and non-reasoning fashions, as a result of, whereas reasoning fashions confirmed better resistance to misuse, they might nonetheless supply up hallucinations or different dangerous habits.

Benchmark throughout distributors since fashions failed at completely different metrics.

Stress take a look at for misuse and syconphancy, and rating each the refusal and the utility of these refuse to indicate the trade-offs between usefulness and guardrails.

Proceed to audit fashions even after deployment.

Whereas many evaluations deal with efficiency, third-party security alignment assessments do exist. For instance, this one from Cyata. Final 12 months, OpenAI launched an alignment educating technique for its fashions known as Guidelines-Based mostly Rewards, whereas Anthropic launched auditing brokers to test mannequin security. 

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.

An error occured.

vb daily phone

You Might Also Like

AI denial is turning into an enterprise threat: Why dismissing “slop” obscures actual functionality positive factors

GAM takes purpose at “context rot”: A dual-agent reminiscence structure that outperforms long-context LLMs

The 'reality serum' for AI: OpenAI’s new technique for coaching fashions to admit their errors

Anthropic vs. OpenAI pink teaming strategies reveal completely different safety priorities for enterprise AI

Inside NetSuite’s subsequent act: Evan Goldberg on the way forward for AI-powered enterprise methods

TAGGED:addcrosstestsenterprisesEvaluationsexposeGPT5jailbreakmisuseOpenAIAnthropicRisks
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Artists and Cultural Staff Protest Arrest of Mahmoud Khalil 
Art

Artists and Cultural Staff Protest Arrest of Mahmoud Khalil 

Editorial Board March 12, 2025
US extra deaths proceed to rise even after the COVID-19 pandemic, research finds
Canada Live Updates: Ontario Police Arrest Protesters Near Border Bridge Blockade
This Is the N.F.L. Season That Ran Too Long
Boxing legend George Foreman useless at 76

You Might Also Like

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional
Technology

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional

December 4, 2025
Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep
Technology

Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep

December 4, 2025
AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding
Technology

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

December 4, 2025
Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them
Technology

Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them

December 4, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?