We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic
Technology

OpenCUA’s open supply computer-use brokers rival proprietary fashions from OpenAI and Anthropic

Last updated: August 23, 2025 1:39 am
Editorial Board Published August 23, 2025
Share
SHARE

A brand new framework from researchers at The College of Hong Kong (HKU) and collaborating establishments gives an open supply basis for creating sturdy AI brokers that may function computer systems. The framework, referred to as OpenCUA, contains the instruments, knowledge, and recipes for scaling the event of computer-use brokers (CUAs).

Fashions educated utilizing this framework carry out strongly on CUA benchmarks, outperforming present open supply fashions and competing intently with closed brokers from main AI labs like OpenAI and Anthropic.

The problem of constructing computer-use brokers

Pc-use brokers are designed to autonomously full duties on a pc, from navigating web sites to working complicated software program. They will additionally assist automate workflows within the enterprise. Nevertheless, essentially the most succesful CUA techniques are proprietary, with important particulars about their coaching knowledge, architectures, and improvement processes saved personal.

“As the lack of transparency limits technical advancements and raises safety concerns, the research community needs truly open CUA frameworks to study their capabilities, limitations, and risks,” the researchers state of their paper.

AI Scaling Hits Its Limits

Energy caps, rising token prices, and inference delays are reshaping enterprise AI. Be a part of our unique salon to find how prime groups are:

Turning vitality right into a strategic benefit

Architecting environment friendly inference for actual throughput positive factors

Unlocking aggressive ROI with sustainable AI techniques

Safe your spot to remain forward: https://bit.ly/4mwGngO

On the identical time, open supply efforts face their very own set of hurdles. There was no scalable infrastructure for gathering the varied, large-scale knowledge wanted to coach these brokers. Present open supply datasets for graphical person interfaces (GUIs) have restricted knowledge, and plenty of analysis tasks present inadequate element about their strategies, making it tough for others to copy their work.

Based on the paper, “These limitations collectively hinder advances in general-purpose CUAs and restrict a meaningful exploration of their scalability, generalizability, and potential learning approaches.”

Introducing OpenCUA

OpenCUA framework Supply: XLANG Lab at HKU

OpenCUA is an open supply framework designed to handle these challenges by scaling each the info assortment and the fashions themselves. At its core is the AgentNet Instrument for recording human demonstrations of laptop duties on completely different working techniques.

The device streamlines knowledge assortment by working within the background on an annotator’s private laptop, capturing display screen movies, mouse and keyboard inputs, and the underlying accessibility tree, which gives structured details about on-screen components. This uncooked knowledge is then processed into “state-action trajectories,” pairing a screenshot of the pc (the state) with the person’s corresponding motion (a click on, key press, and so forth.). Annotators can then evaluate, edit, and submit these demonstrations.

image 05f5ebAgentNet device Supply: XLang Lab at HKU

Utilizing this device, the researchers collected the AgentNet dataset, which comprises over 22,600 process demonstrations throughout Home windows, macOS, and Ubuntu, spanning greater than 200 functions and web sites. “This dataset authentically captures the complexity of human behaviors and environmental dynamics from users’ personal computing environments,” the paper notes.

Recognizing that screen-recording instruments increase vital knowledge privateness issues for enterprises, the researchers designed the AgentNet Instrument with safety in thoughts. Xinyuan Wang, co-author of the paper and PhD pupil at HKU, defined that they carried out a multi-layer privateness safety framework. “First, annotators themselves can fully observe the data they generate… before deciding whether to submit it,” he informed VentureBeat. The information then undergoes handbook verification for privateness points and automatic scanning by a big mannequin to detect any remaining delicate content material earlier than launch. “This layered process ensures enterprise-grade robustness for environments handling sensitive customer or financial data,” Wang added.

To speed up analysis, the staff additionally curated AgentNetBench, an offline benchmark that gives a number of appropriate actions for every step, providing a extra environment friendly strategy to measure an agent’s efficiency.

A brand new recipe for coaching brokers

The OpenCUA framework introduces a novel pipeline for processing knowledge and coaching computer-use brokers. Step one converts the uncooked human demonstrations into clear state-action pairs appropriate for coaching vision-language fashions (VLMs). Nevertheless, the researchers discovered that merely coaching fashions on these pairs yields restricted efficiency positive factors, even with giant quantities of information.

image fb4236OpenCUA chain-of-thought pipeline Supply: XLang Lab at HKU

The important thing perception was to enhance these trajectories with chain-of-thought (CoT) reasoning. This course of generates an in depth “inner monologue” for every motion, which incorporates planning, reminiscence, and reflection. This structured reasoning is organized into three ranges: a high-level commentary of the display screen, reflective ideas that analyze the state of affairs and plan the following steps, and at last, the concise, executable motion. This strategy helps the agent develop a deeper understanding of the duties.

“We find natural language reasoning crucial for generalizable computer-use foundation models, helping CUAs internalize cognitive capabilities,” the researchers write.

This knowledge synthesis pipeline is a common framework that may be tailored by corporations to coach brokers on their very own distinctive inside instruments. Based on Wang, an enterprise can report demonstrations of its proprietary workflows and use the identical “reflector” and “generator” pipeline to create the mandatory coaching knowledge. “This allows them to bootstrap a high-performing agent tailored to their internal tools without needing to handcraft reasoning traces manually,” he defined.

Placing OpenCUA to the check

The researchers utilized the OpenCUA framework to coach a spread of open supply VLMs, together with variants of Qwen and Kimi-VL, with parameter sizes from 3 billion to 32 billion. The fashions have been evaluated on a collection of on-line and offline benchmarks that check their means to carry out duties and perceive GUIs.

The 32-billion-parameter mannequin, OpenCUA-32B, established a brand new state-of-the-art success charge amongst open supply fashions on the OSWorld-Verified benchmark. It additionally surpassed OpenAI’s GPT-4o-based CUA and considerably closed the efficiency hole with Anthropic’s main proprietary fashions.

image e2e708OpenCUA exhibits huge enchancment over base fashions (left) whereas competing with main CUA fashions (proper) Supply: XLANG Lab at HKU

For enterprise builders and product leaders, the analysis affords a number of key findings. The OpenCUA methodology is broadly relevant, enhancing efficiency on fashions with completely different architectures (each dense and mixture-of-experts) and sizes. The educated brokers additionally present sturdy generalization, performing nicely throughout a various vary of duties and working techniques.

Based on Wang, the framework is especially fitted to automating repetitive, labor-intensive enterprise workflows. “For example, in the AgentNet dataset, we already capture a few demonstrations of launching EC2 instances on Amazon AWS and configuring annotation parameters on MTurk,” he informed VentureBeat. “These tasks involve many sequential steps but follow repeatable patterns.”

Nevertheless, Wang famous that bridging the hole to dwell deployment requires addressing key challenges round security and reliability. “The biggest challenge in real deployment is safety and reliability: the agent must avoid mistakes that could inadvertently alter system settings or trigger harmful side effects beyond the intended task,” he stated.

The researchers have launched the code, dataset, and weights for his or her fashions.

As open supply brokers constructed on frameworks like OpenCUA change into extra succesful, they might basically evolve the connection between information staff and their computer systems. Wang envisions a future the place proficiency in complicated software program turns into much less necessary than the flexibility to obviously articulate objectives to an AI agent.

He described two main modes of labor: “offline automation, where the agent leverages its broader software knowledge to pursue a task end-to-end,” and “online collaboration, where the agent responds in real-time and works side by side with the human, much like a colleague.” Mainly, the people will present the strategic “what,” whereas more and more refined AI brokers deal with the operational “how.”

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

An error occured.

vb daily phone

You Might Also Like

Why AI coding brokers aren’t production-ready: Brittle context home windows, damaged refactors, lacking operational consciousness

AI denial is turning into an enterprise threat: Why dismissing “slop” obscures actual functionality positive factors

GAM takes purpose at “context rot”: A dual-agent reminiscence structure that outperforms long-context LLMs

The 'reality serum' for AI: OpenAI’s new technique for coaching fashions to admit their errors

Anthropic vs. OpenAI pink teaming strategies reveal completely different safety priorities for enterprise AI

TAGGED:agentsAnthropiccomputerusemodelsopenOpenAIOpenCUAsproprietaryRivalsource
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Pakistan experiences new polio case in northwest, elevating nationwide tally to 50 circumstances this yr
Health

Pakistan experiences new polio case in northwest, elevating nationwide tally to 50 circumstances this yr

Editorial Board November 20, 2024
Democrats Face a Dilemma on Voting: Compromise or Keep Pressing?
Exercise May Enhance the Effects of a Covid or Flu Shot
Research reveals focused remedy for aggressive liver most cancers
Staten Island St. Patrick’s Day Parade committee lifts ban on LGBTQ marchers

You Might Also Like

Inside NetSuite’s subsequent act: Evan Goldberg on the way forward for AI-powered enterprise methods
Technology

Inside NetSuite’s subsequent act: Evan Goldberg on the way forward for AI-powered enterprise methods

December 4, 2025
Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional
Technology

Nvidia's new AI framework trains an 8B mannequin to handle instruments like a professional

December 4, 2025
Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep
Technology

Gong examine: Gross sales groups utilizing AI generate 77% extra income per rep

December 4, 2025
AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding
Technology

AWS launches Kiro powers with Stripe, Figma, and Datadog integrations for AI-assisted coding

December 4, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?