We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Qwen3-Max Pondering beats Gemini 3 Professional and GPT-5.2 on Humanity's Final Examination (with search)
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Qwen3-Max Pondering beats Gemini 3 Professional and GPT-5.2 on Humanity's Final Examination (with search)
Qwen3-Max Pondering beats Gemini 3 Professional and GPT-5.2 on Humanity's Final Examination (with search)
Technology

Qwen3-Max Pondering beats Gemini 3 Professional and GPT-5.2 on Humanity's Final Examination (with search)

Last updated: January 27, 2026 1:09 am
Editorial Board Published January 27, 2026
Share
SHARE

Chinese language AI and tech corporations proceed to impress with their growth of cutting-edge, state-of-the-art AI language fashions.

Immediately, the one drawing eyeballs is Alibaba Cloud's Qwen Crew of AI researchers and its unveiling of a brand new proprietary language reasoning mannequin, Qwen3-Max-Pondering.

It’s possible you’ll recall, as VentureBeat coated final yr, that Qwen has made a reputation for itself within the fast-moving world AI market by transport a wide range of highly effective, open supply fashions in varied modalities, from textual content to picture to spoken audio. The corporate even earned an endorsement from U.S. tech lodgings big Airbnb, whose CEO and co-founder Brian Chesky stated the corporate was counting on Qwen's free, open supply fashions as a extra inexpensive different to U.S. choices like these of OpenAI.

Now, with the proprietary Qwen3-Max-Pondering, the Qwen Crew is aiming to match and, in some circumstances, outpace the reasoning capabilities of GPT-5.2 and Gemini 3 Professional by architectural effectivity and agentic autonomy.

The discharge comes at a important juncture. Western labs have largely outlined the "reasoning" class (typically dubbed "System 2" logic), however Qwen’s newest benchmarks counsel the hole has closed.

As well as, the corporate's comparatively inexpensive API pricing technique aggressively targets enterprise adoption. Nevertheless, as it’s a Chinese language mannequin, some U.S. corporations with strict nationwide safety necessities and concerns could also be cautious of adopting it.

The Structure: "Test-Time Scaling" Redefined

The core innovation driving Qwen3-Max-Pondering is a departure from normal inference strategies. Whereas most fashions generate tokens linearly, Qwen3 makes use of a "heavy mode" pushed by a way generally known as "Test-time scaling."

In easy phrases, this method permits the mannequin to commerce compute for intelligence. However not like naive "best-of-N" sampling—the place a mannequin would possibly generate 100 solutions and choose the most effective one — Qwen3-Max-Pondering employs an experience-cumulative, multi-round technique.

This strategy mimics human problem-solving. When the mannequin encounters a fancy question, it doesn't simply guess; it engages in iterative self-reflection. It makes use of a proprietary "take-experience" mechanism to distill insights from earlier reasoning steps. This enables the mannequin to:

Establish Lifeless Ends: Acknowledge when a line of reasoning is failing while not having to totally traverse it.

Focus Compute: Redirect processing energy towards "unresolved uncertainties" quite than re-deriving identified conclusions.

The effectivity positive aspects are tangible. By avoiding redundant reasoning, the mannequin integrates richer historic context into the identical window. The Qwen crew experiences that this technique drove huge efficiency jumps with out exploding token prices:

GPQA (PhD-level science): Scores improved from 90.3 to 92.8.

LiveCodeBench v6: Efficiency jumped from 88.0 to 91.4.

Past Pure Thought: Adaptive Tooling

Whereas "thinking" fashions are highly effective, they’ve traditionally been siloed — nice at math, however poor at searching the net or operating code. Qwen3-Max-Pondering bridges this hole by successfully integrating "thinking and non-thinking modes".

The mannequin options adaptive tool-use capabilities, which means it autonomously selects the appropriate device for the job with out guide person prompting. It may possibly seamlessly toggle between:

Internet Search & Extraction: For real-time factual queries.

Reminiscence: To retailer and recall user-specific context.

Code Interpreter: To put in writing and execute Python snippets for computational duties.

In "Thinking Mode," the mannequin helps these instruments concurrently. This functionality is important for enterprise functions the place a mannequin would possibly must confirm a truth (Search), calculate a projection (Code Interpreter), after which motive in regards to the strategic implication (Pondering) multi functional flip.

Empirically, the crew notes that this mix "effectively mitigates hallucinations," because the mannequin can floor its reasoning in verifiable exterior knowledge quite than relying solely on its coaching weights.

Benchmark Evaluation: The Information Story

Qwen will not be shy about direct comparisons.

On HMMT Feb 25, a rigorous reasoning benchmark, Qwen3-Max-Pondering scored 98.0, edging out Gemini 3 Professional (97.5) and considerably main DeepSeek V3.2 (92.5).

Nevertheless, probably the most vital sign for builders is arguably Agentic Search. On "Humanity's Last Exam" (HLE) — the benchmark that measures efficiency on 3,000 "Google-proof" graduate-level questions throughout math, science, pc science, humanities and engineering — Qwen3-Max-Pondering, outfitted with internet search instruments, scored 49.8, beating each Gemini 3 Professional (45.8) and GPT-5.2-Pondering (45.5) .

This implies that Qwen3-Max-Pondering’s structure is uniquely suited to advanced, multi-step agentic workflows the place exterior knowledge retrieval is important.

In coding duties, the mannequin additionally shines. On Enviornment-Arduous v2, it posted a rating of 90.2, leaving opponents like Claude-Opus-4.5 (76.7) far behind.

The Economics of Reasoning: Pricing Breakdown

For the primary time, we’ve a transparent have a look at the economics of Qwen's top-tier reasoning mannequin. Alibaba Cloud has positioned qwen3-max-2026-01-23 as a premium however accessible providing on its API.

Enter: $1.20 per 1 million tokens (for normal contexts <= 32k).

Output: $6.00 per 1 million tokens.

On a base stage, right here's how Qwen3-Max-Pondering stacks up:

Mannequin

Enter (/1M)

Output (/1M)

Whole Price

Supply

Qwen 3 Turbo

$0.05

$0.20

$0.25

Alibaba Cloud

Grok 4.1 Quick (reasoning)

$0.20

$0.50

$0.70

xAI

Grok 4.1 Quick (non-reasoning)

$0.20

$0.50

$0.70

xAI

deepseek-chat (V3.2-Exp)

$0.28

$0.42

$0.70

DeepSeek

deepseek-reasoner (V3.2-Exp)

$0.28

$0.42

$0.70

DeepSeek

Qwen 3 Plus

$0.40

$1.20

$1.60

Alibaba Cloud

ERNIE 5.0

$0.85

$3.40

$4.25

Qianfan

Gemini 3 Flash Preview

$0.50

$3.00

$3.50

Google

Claude Haiku 4.5

$1.00

$5.00

$6.00

Anthropic

Qwen3-Max Pondering (2026-01-23)

$1.20

$6.00

$7.20

Alibaba Cloud

Gemini 3 Professional (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.2

$1.75

$14.00

$15.75

OpenAI

Claude Sonnet 4.5

$3.00

$15.00

$18.00

Anthropic

Gemini 3 Professional (>200K)

$4.00

$18.00

$22.00

Google

Claude Opus 4.5

$5.00

$25.00

$30.00

Anthropic

GPT-5.2 Professional

$21.00

$168.00

$189.00

OpenAI

This pricing construction is aggressive, undercutting many legacy flagship fashions whereas providing state-of-the-art efficiency.

Nevertheless, builders ought to word the granular pricing for the brand new agentic capabilities, as Qwen separates the price of "thinking" (tokens) from the price of "doing" (device use).

Agent Search Technique: Each normal search_strategy:agent and the extra superior search_strategy:agent_max are priced at $10 per 1,000 calls.

Observe: The agent_max technique is presently marked as a "Limited Time Offer," suggesting its worth could rise later.

Internet Search: Priced at $10 per 1,000 calls by way of the Responses API.

Promotional Free Tier:To encourage adoption of its most superior options, Alibaba Cloud is presently providing two key instruments at no cost for a restricted time:

Internet Extractor: Free (Restricted Time).

Code Interpreter: Free (Restricted Time).

This pricing mannequin (low token price + à la carte device pricing) permits builders to construct advanced brokers which are cost-effective for textual content processing, whereas paying a premium solely when exterior actions—like a stay internet search—are explicitly triggered.

Developer Ecosystem

Recognizing that efficiency is ineffective with out integration, Alibaba Cloud has ensured Qwen3-Max-Pondering is drop-in prepared.

OpenAI Compatibility: The API helps the usual OpenAI format, permitting groups to change fashions by merely altering the base_url and mannequin identify.

Anthropic Compatibility: In a savvy transfer to seize the coding market, the API additionally helps the Anthropic protocol. This makes Qwen3-Max-Pondering suitable with Claude Code, a preferred agentic coding atmosphere.

The Verdict

Qwen3-Max-Pondering represents a maturation of the AI market in 2026. It strikes the dialog past "who has the smartest chatbot" to "who has the most capable agent."

By combining high-efficiency reasoning with adaptive, autonomous device use—and pricing it to maneuver—Qwen has firmly established itself as a top-tier contender for the enterprise AI throne.

For builders and enterprises, the "Limited Time Free" home windows on Code Interpreter and Internet Extractor counsel now’s the time to experiment. The reasoning wars are removed from over, however Qwen has simply deployed a really heavy hitter.

You Might Also Like

MCP shipped with out authentication. Clawdbot reveals why that's an issue.

Asana launches Claude integration, says AI fashions are 'context-starved' with out enterprise knowledge

Browser-based assaults hit 95% of enterprises — and conventional safety instruments by no means noticed them coming

Claude Code's 'Duties' replace lets brokers work longer and coordinate throughout periods

Anthropic embeds Slack, Figma and Asana inside Claude, turning AI chat right into a office command middle

TAGGED:beatsexamGeminiGPT5.2Humanity039sproQwen3Maxsearchthinking
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
The CIA believes COVID almost definitely originated from a lab however has low confidence in its personal discovering
Health

The CIA believes COVID almost definitely originated from a lab however has low confidence in its personal discovering

Editorial Board January 26, 2025
Vacationer Denied US Entry After ICE Discovered JD Vance Meme on His Cellphone
John Harbaugh prioritized ‘structure,’ Joe Schoen admits ultimate say is ‘on paper’
How ‘Under the Banner of Heaven’ Took On Murder and the Mormon Church
Survey reveals most mother and father do not ask about firearms within the houses their children go to

You Might Also Like

The period of agentic AI calls for an information structure, not higher prompts
Technology

The period of agentic AI calls for an information structure, not higher prompts

January 25, 2026
Conversational AI doesn’t perceive customers — 'Intent First' structure does
Technology

Conversational AI doesn’t perceive customers — 'Intent First' structure does

January 25, 2026
Claude Cowork turns Claude from a chat software into shared AI infrastructure
Technology

Claude Cowork turns Claude from a chat software into shared AI infrastructure

January 24, 2026
How OpenAI is scaling the PostgreSQL database to 800 million customers
Technology

How OpenAI is scaling the PostgreSQL database to 800 million customers

January 23, 2026

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?