We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: QwenLong-L1 solves long-context reasoning problem that stumps present LLMs
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > QwenLong-L1 solves long-context reasoning problem that stumps present LLMs
QwenLong-L1 solves long-context reasoning problem that stumps present LLMs
Technology

QwenLong-L1 solves long-context reasoning problem that stumps present LLMs

Last updated: May 31, 2025 1:40 am
Editorial Board Published May 31, 2025
Share
SHARE

Alibaba Group has launched QwenLong-L1, a brand new framework that permits massive language fashions (LLMs) to cause over extraordinarily lengthy inputs. This growth might unlock a brand new wave of enterprise purposes that require fashions to grasp and draw insights from intensive paperwork comparable to detailed company filings, prolonged monetary statements, or advanced authorized contracts.

The problem of long-form reasoning for AI

Latest advances in massive reasoning fashions (LRMs), significantly via reinforcement studying (RL), have considerably improved their problem-solving capabilities. Analysis exhibits that when skilled with RL fine-tuning, LRMs purchase abilities just like human “slow thinking,” the place they develop subtle methods to deal with advanced duties.

Nonetheless, these enhancements are primarily seen when fashions work with comparatively quick items of textual content, usually round 4,000 tokens. The power of those fashions to scale their reasoning to for much longer contexts (e.g., 120,000 tokens) stays a serious problem. Such long-form reasoning requires a strong understanding of your complete context and the power to carry out multi-step evaluation. “This limitation poses a significant barrier to practical applications requiring interaction with external knowledge, such as deep research, where LRMs must collect and process information from knowledge-intensive environments,” the builders of QwenLong-L1 write of their paper.

The researchers formalize these challenges into the idea of “long-context reasoning RL.” In contrast to short-context reasoning, which frequently depends on information already saved throughout the mannequin, long-context reasoning RL requires fashions to retrieve and floor related data from prolonged inputs precisely. Solely then can they generate chains of reasoning based mostly on this included data. 

Coaching fashions for this via RL is difficult and sometimes ends in inefficient studying and unstable optimization processes. Fashions wrestle to converge on good options or lose their capability to discover numerous reasoning paths.

QwenLong-L1: A multi-stage strategy

QwenLong-L1 is a reinforcement studying framework designed to assist LRMs transition from proficiency with quick texts to strong generalization throughout lengthy contexts. The framework enhances current short-context LRMs via a fastidiously structured, multi-stage course of:

Heat-up Supervised Wonderful-Tuning (SFT): The mannequin first undergoes an SFT part, the place it’s skilled on examples of long-context reasoning. This stage establishes a stable basis, enabling the mannequin to floor data precisely from lengthy inputs. It helps develop elementary capabilities in understanding context, producing logical reasoning chains, and extracting solutions.

Curriculum-Guided Phased RL: At this stage, the mannequin is skilled via a number of phases, with the goal size of the enter paperwork regularly rising. This systematic, step-by-step strategy helps the mannequin stably adapt its reasoning methods from shorter to progressively longer contexts. It avoids the instability typically seen when fashions are abruptly skilled on very lengthy texts.

Issue-Conscious Retrospective Sampling: The ultimate coaching stage incorporates difficult examples from the previous coaching phases, guaranteeing the mannequin continues to study from the toughest issues. This prioritizes tough situations and encourages the mannequin to discover extra numerous and complicated reasoning paths.

QwenLong-L1 course of Supply: arXiv

Past this structured coaching, QwenLong-L1 additionally makes use of a definite reward system. Whereas coaching for short-context reasoning duties typically depends on strict rule-based rewards (e.g., an accurate reply in a math drawback), QwenLong-L1 employs a hybrid reward mechanism. This combines rule-based verification, which ensures precision by checking for strict adherence to correctness standards, with an “LLM-as-a-judge.” This choose mannequin compares the semanticity of the generated reply with the bottom fact, permitting for extra flexibility and higher dealing with of the varied methods right solutions will be expressed when coping with lengthy, nuanced paperwork.

Placing QwenLong-L1 to the take a look at

The Alibaba group evaluated QwenLong-L1 utilizing doc question-answering (DocQA) as the first process. This situation is very related to enterprise wants, the place AI should perceive dense paperwork to reply advanced questions. 

Experimental outcomes throughout seven long-context DocQA benchmarks confirmed QwenLong-L1’s capabilities. Notably, the QWENLONG-L1-32B mannequin (based mostly on DeepSeek-R1-Distill-Qwen-32B) achieved efficiency corresponding to Anthropic’s Claude-3.7 Sonnet Pondering, and outperformed fashions like OpenAI’s o3-mini and Qwen3-235B-A22B. The smaller QWENLONG-L1-14B mannequin additionally outperformed Google’s Gemini 2.0 Flash Pondering and Qwen3-32B. 

Source: arXivSupply: arXiv

An necessary discovering related to real-world purposes is how RL coaching ends in the mannequin growing specialised long-context reasoning behaviors. The paper notes that fashions skilled with QwenLong-L1 change into higher at “grounding” (linking solutions to particular elements of a doc), “subgoal setting” (breaking down advanced questions), “backtracking” (recognizing and correcting their very own errors mid-reasoning), and “verification” (double-checking their solutions).

For example, whereas a base mannequin may get sidetracked by irrelevant particulars in a monetary doc or get caught in a loop of over-analyzing unrelated data, the QwenLong-L1 skilled mannequin demonstrated a capability to have interaction in efficient self-reflection. It might efficiently filter out these distractor particulars, backtrack from incorrect paths, and arrive on the right reply.

Methods like QwenLong-L1 might considerably broaden the utility of AI within the enterprise. Potential purposes embrace authorized tech (analyzing hundreds of pages of authorized paperwork), finance (deep analysis on annual studies and monetary filings for threat evaluation or funding alternatives) and customer support (analyzing lengthy buyer interplay histories to supply extra knowledgeable assist). The researchers have launched the code for the QwenLong-L1 recipe and the weights for the skilled fashions.

Each day insights on enterprise use instances with VB Each day

If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.

An error occured.

vb daily phone

You Might Also Like

AI denial is turning into an enterprise threat: Why dismissing “slop” obscures actual functionality positive factors

GAM takes purpose at “context rot”: A dual-agent reminiscence structure that outperforms long-context LLMs

The 'reality serum' for AI: OpenAI’s new technique for coaching fashions to admit their errors

Anthropic vs. OpenAI pink teaming strategies reveal completely different safety priorities for enterprise AI

Inside NetSuite’s subsequent act: Evan Goldberg on the way forward for AI-powered enterprise methods

TAGGED:ChallengecurrentLLMslongcontextQwenLongL1reasoningsolvesstumps
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Covid Live Updates: W.H.O. Warns of ‘Very High’ Risk From Omicron as Questions Remain
World

Covid Live Updates: W.H.O. Warns of ‘Very High’ Risk From Omicron as Questions Remain

Editorial Board November 29, 2021
A Moose Hunting Class Is Teaching Students About Food
AI may assist the Beatles win their last Grammy. Will extra veteran acts comply with?
Tremendous Bowl halftime present protester banned for all times by NFL however ‘pretty unlikely’ to be arrested, officers say
The ten Cloudiest Cities within the U.S., Ranked

You Might Also Like

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional
Technology

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional

December 4, 2025
Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep
Technology

Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep

December 4, 2025
AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding
Technology

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

December 4, 2025
Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them
Technology

Workspace Studio goals to unravel the true agent drawback: Getting staff to make use of them

December 4, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?