We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Google's 'Watch & Study' framework cracks the info bottleneck for coaching computer-use brokers
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Google's 'Watch & Study' framework cracks the info bottleneck for coaching computer-use brokers
Google's 'Watch & Study' framework cracks the info bottleneck for coaching computer-use brokers
Technology

Google's 'Watch & Study' framework cracks the info bottleneck for coaching computer-use brokers

Last updated: October 27, 2025 9:03 pm
Editorial Board Published October 27, 2025
Share
SHARE

A brand new framework developed by researchers at Google Cloud and DeepMind goals to deal with one of many key challenges of creating laptop use brokers (CUAs): Gathering high-quality coaching examples at scale.

The framework, dubbed Watch & Study (W&L), addresses the issue of coaching knowledge technology in a means that doesn’t require human annotation and may robotically extract demonstrations from uncooked movies.

Their experiments present that knowledge generated W&L can be utilized to coach or fine-tune present laptop use and basis fashions to enhance their efficiency on computer-use duties. However equally vital, the identical method can be utilized to create in-context studying (ICL) examples for laptop use brokers, enabling firms to create CUAs for bespoke inner duties with out the necessity for pricey coaching of specialised fashions.

The info bottleneck of CUA

The online is wealthy with video tutorials and screencasts that describe complicated workflows for utilizing functions. These movies are a gold mine that may present laptop use brokers with area information and directions for engaging in totally different duties via consumer interface interactions.

Nonetheless, earlier than they can be utilized to coach CUA brokers, these movies must be remodeled into annotated trajectories (that’s, a set of process descriptions, screenshots and actions), a course of that’s prohibitively costly and time-consuming when accomplished manually.

Current approaches to deal with this knowledge bottleneck depend on annotating these movies via the usage of multimodal language fashions, which often lead to low precision and defective examples. A distinct method makes use of self-play brokers that autonomously discover consumer interfaces to gather trajectories. Nonetheless, strategies utilizing this method often create easy examples that aren’t helpful in unpredictable real-world conditions.

Because the researchers notice of their paper, “Overall, these approaches either rely on brittle heuristics, are costly as they rely on explorations in real environments or generate low-complexity demonstrations misaligned with human intent.”

Watch & Study

The Watch & Study framework tries to deal with the challenges of making CUA demonstrations by rethinking the issue formulation.

As an alternative of instantly producing trajectories or relying on complicated multi-stage pipelines, the researchers body the issue as an “inverse dynamics objective”: Given two consecutive observations, predict the intermediate motion that produced the transition.

Based on the researchers, this formulation is “easier to learn, avoids hand-crafted heuristics and generalizes robustly across applications.”

The W&L framework might be damaged down into three key phases: Coaching an inverse dynamics mannequin (IDM), retrieving uncooked movies, and coaching CUA brokers.

Within the first part, the researchers used brokers to work together with dwell net pages to create a big corpus of 500,000 state transitions (two consecutive observations and the motion that resulted within the transition). They then used this knowledge (together with 132,000 human-annotated transitions from present open datasets) to coach an inverse dynamics mannequin (IDM) that takes in two consecutive observations and predicts the transition motion. Their educated IDM, which is a small transformer mannequin, outperformed off-the-shelf basis fashions in predicting transition actions.

The researchers then designed a pipeline that retrieves movies from platforms akin to YouTube and runs them via IDM to generate high-quality trajectories. The IDM takes in consecutive video frames and determines the actions (scroll, click on) that triggered the modifications within the surroundings, that are then packaged into annotated trajectories. Utilizing this methodology, they generated 53,125 trajectories with high-accuracy motion labels.

These examples can be utilized to coach efficient laptop use fashions for particular duties. However the researchers additionally discovered that trajectories extracted via IDM can function in-context studying examples to enhance the efficiency of CUAs on bespoke duties at inference time. For ICL, they use Gemini 2.5 Flash so as to add extra reasoning annotations to the commentary/motion examples within the trajectories, which might then be inserted into the CUA agent’s immediate (often 3-5 examples) throughout inference.

“This dual role (training and in-context guidance) enables flexible integration with both open-source models and general-purpose agents,” the researchers write.

W&L in motion

To check the usefulness of W&L, the researchers ran a sequence of experiments with closed and open supply fashions on the OSWorld benchmark, which evaluates brokers in actual desktop and working system environments throughout totally different duties, together with productiveness, programming and design.

For fine-tuning, they used their corpus of 53,000 trajectories to coach two open supply fashions: UI-TARS-1.5, a robust, open supply vision-language-action mannequin designed particularly for laptop use, and Qwen 2.5-VL, an open-weight multimodal LLM. 

For in-context studying assessments, they utilized W&L examples to general-purpose multimodal fashions akin to Gemini 2.5 Flash, OpenAI o3 and Claude Sonnet 4. 

W&L resulted in enhancements on OSWorld in all mannequin classes, together with as much as 3 factors for ICL on general-purpose fashions and as much as 11 factors for fine-tuned open-source fashions.

Extra importantly, these advantages had been achieved with none handbook annotation, “demonstrating that web-scale human workflows can serve as a practical and scalable foundation for advancing CUAs towards real-world deployment,” the researchers write.

This might have vital implications for real-world functions, enabling enterprises to show their present corpora of movies and convention recordings into coaching knowledge for CUAs. It additionally makes it simpler to generate new coaching trajectories. All you will want to do is document movies of performing totally different duties and have them annotated by an IDM. And with frontier fashions continuously bettering and turning into cheaper, you possibly can anticipate to get extra out of your present knowledge and the sector continues to progress.

You Might Also Like

Claude Cowork turns Claude from a chat software into shared AI infrastructure

How OpenAI is scaling the PostgreSQL database to 800 million customers

Researchers broke each AI protection they examined. Listed below are 7 inquiries to ask distributors.

MemRL outperforms RAG on complicated agent benchmarks with out fine-tuning

All the pieces in voice AI simply modified: how enterprise AI builders can profit

TAGGED:039WatchagentsampbottleneckcomputerusecracksdataframeworkGoogle039sLearn039training
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Ingrid Lewis-Martin resigns from position as prime advisor to NYC Mayor Adams
Politics

Ingrid Lewis-Martin resigns from position as prime advisor to NYC Mayor Adams

Editorial Board December 15, 2024
Olympics broadcast middle and film studio coming to Hollywood Park
Gunfire Rattles Burkina Faso’s Capital as Soldiers Revolt
Governors Ball 2025 lineup: Olivia Rodrigo and Hozier amongst headliners
‘Loot’ Review: Maya Rudolph Among the .001 Percent

You Might Also Like

Salesforce Analysis: Throughout the C-suite, belief is the important thing to scaling agentic AI
Technology

Salesforce Analysis: Throughout the C-suite, belief is the important thing to scaling agentic AI

January 22, 2026
Railway secures 0 million to problem AWS with AI-native cloud infrastructure
Technology

Railway secures $100 million to problem AWS with AI-native cloud infrastructure

January 22, 2026
Why LinkedIn says prompting was a non-starter — and small fashions was the breakthrough
Technology

Why LinkedIn says prompting was a non-starter — and small fashions was the breakthrough

January 22, 2026
ServiceNow positions itself because the management layer for enterprise AI execution
Technology

ServiceNow positions itself because the management layer for enterprise AI execution

January 21, 2026

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?