We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient mission gone sideways
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient mission gone sideways
From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient mission gone sideways
Technology

From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient mission gone sideways

Last updated: July 7, 2025 3:15 pm
Editorial Board Published July 7, 2025
Share
SHARE

Laptop imaginative and prescient tasks hardly ever go precisely as deliberate, and this one was no exception. The concept was easy: Construct a mannequin that might have a look at a photograph of a laptop computer and determine any bodily harm — issues like cracked screens, lacking keys or damaged hinges. It appeared like a simple use case for picture fashions and huge language fashions (LLMs), nevertheless it shortly became one thing extra sophisticated.

Alongside the way in which, we bumped into points with hallucinations, unreliable outputs and pictures that weren’t even laptops. To resolve these, we ended up making use of an agentic framework in an atypical manner — not for job automation, however to enhance the mannequin’s efficiency.

On this put up, we’ll stroll via what we tried, what didn’t work and the way a mix of approaches finally helped us construct one thing dependable.

The place we began: Monolithic prompting

Our preliminary method was pretty commonplace for a multimodal mannequin. We used a single, giant immediate to cross a picture into an image-capable LLM and requested it to determine seen harm. This monolithic prompting technique is straightforward to implement and works decently for clear, well-defined duties. However real-world knowledge hardly ever performs alongside.

We bumped into three main points early on:

Hallucinations: The mannequin would typically invent harm that didn’t exist or mislabel what it was seeing.

Junk picture detection: It had no dependable strategy to flag pictures that weren’t even laptops, like photos of desks, partitions or folks often slipped via and obtained nonsensical harm reviews.

Inconsistent accuracy: The mixture of those issues made the mannequin too unreliable for operational use.

This was the purpose when it turned clear we would want to iterate.

First repair: Mixing picture resolutions

One factor we seen was how a lot picture high quality affected the mannequin’s output. Customers uploaded every kind of pictures starting from sharp and high-resolution to blurry. This led us to check with analysis highlighting how picture decision impacts deep studying fashions.

We skilled and examined the mannequin utilizing a mixture of high-and low-resolution pictures. The concept was to make the mannequin extra resilient to the wide selection of picture qualities it might encounter in follow. This helped enhance consistency, however the core problems with hallucination and junk picture dealing with endured.

The multimodal detour: Textual content-only LLM goes multimodal

Inspired by latest experiments in combining picture captioning with text-only LLMs — just like the method lined in The Batch, the place captions are generated from pictures after which interpreted by a language mannequin, we determined to offer it a strive.

Right here’s the way it works:

The LLM begins by producing a number of attainable captions for a picture. 

One other mannequin, referred to as a multimodal embedding mannequin, checks how properly every caption matches the picture. On this case, we used SigLIP to attain the similarity between the picture and the textual content.

The system retains the highest few captions primarily based on these scores.

The LLM makes use of these high captions to put in writing new ones, making an attempt to get nearer to what the picture really exhibits.

It repeats this course of till the captions cease enhancing, or it hits a set restrict.

Whereas intelligent in concept, this method launched new issues for our use case:

Persistent hallucinations: The captions themselves typically included imaginary harm, which the LLM then confidently reported.

Incomplete protection: Even with a number of captions, some points have been missed totally.

Elevated complexity, little profit: The added steps made the system extra sophisticated with out reliably outperforming the earlier setup.

It was an attention-grabbing experiment, however in the end not an answer.

A artistic use of agentic frameworks

This was the turning level. Whereas agentic frameworks are normally used for orchestrating job flows (suppose brokers coordinating calendar invitations or customer support actions), we questioned if breaking down the picture interpretation job into smaller, specialised brokers may assist.

We constructed an agentic framework structured like this:

Orchestrator agent: It checked the picture and recognized which laptop computer elements have been seen (display, keyboard, chassis, ports).

Element brokers: Devoted brokers inspected every element for particular harm varieties; for instance, one for cracked screens, one other for lacking keys.

Junk detection agent: A separate agent flagged whether or not the picture was even a laptop computer within the first place.

This modular, task-driven method produced rather more exact and explainable outcomes. Hallucinations dropped dramatically, junk pictures have been reliably flagged and every agent’s job was easy and targeted sufficient to regulate high quality properly.

The blind spots: Commerce-offs of an agentic method

As efficient as this was, it was not good. Two primary limitations confirmed up:

Elevated latency: Operating a number of sequential brokers added to the entire inference time.

Protection gaps: Brokers might solely detect points they have been explicitly programmed to search for. If a picture confirmed one thing surprising that no agent was tasked with figuring out, it might go unnoticed.

We wanted a strategy to steadiness precision with protection.

The hybrid answer: Combining agentic and monolithic approaches

To bridge the gaps, we created a hybrid system:

The agentic framework ran first, dealing with exact detection of identified harm varieties and junk pictures. We restricted the variety of brokers to probably the most important ones to enhance latency.

Then, a monolithic picture LLM immediate scanned the picture for the rest the brokers might need missed.

Lastly, we fine-tuned the mannequin utilizing a curated set of pictures for high-priority use instances, like regularly reported harm eventualities, to additional enhance accuracy and reliability.

This mix gave us the precision and explainability of the agentic setup, the broad protection of monolithic prompting and the boldness enhance of focused fine-tuning.

What we realized

A couple of issues turned clear by the point we wrapped up this mission:

Agentic frameworks are extra versatile than they get credit score for: Whereas they’re normally related to workflow administration, we discovered they might meaningfully enhance mannequin efficiency when utilized in a structured, modular manner.

Mixing completely different approaches beats counting on only one: The mixture of exact, agent-based detection alongside the broad protection of LLMs, plus a little bit of fine-tuning the place it mattered most, gave us way more dependable outcomes than any single methodology by itself.

Visible fashions are susceptible to hallucinations: Even the extra superior setups can bounce to conclusions or see issues that aren’t there. It takes a considerate system design to maintain these errors in verify.

Picture high quality selection makes a distinction: Coaching and testing with each clear, high-resolution pictures and on a regular basis, lower-quality ones helped the mannequin keep resilient when confronted with unpredictable, real-world images.

You want a strategy to catch junk pictures: A devoted verify for junk or unrelated photos was one of many easiest modifications we made, and it had an outsized impression on general system reliability.

Last ideas

What began as a easy thought, utilizing an LLM immediate to detect bodily harm in laptop computer pictures, shortly became a a lot deeper experiment in combining completely different AI methods to deal with unpredictable, real-world issues. Alongside the way in which, we realized that among the most helpful instruments have been ones not initially designed for the sort of work.

Agentic frameworks, typically seen as workflow utilities, proved surprisingly efficient when repurposed for duties like structured harm detection and picture filtering. With a little bit of creativity, they helped us construct a system that was not simply extra correct, however simpler to know and handle in follow.

Shruti Tiwari is an AI product supervisor at Dell Applied sciences.

Vadiraj Kulkarni is a knowledge scientist at Dell Applied sciences.

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what corporations are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for optimum ROI.

An error occured.

You Might Also Like

Deborah Dalton: Award-Winning Novels and Film

Genesis Quantum Mining AI Poised to Become the Next Global Tech Giant

How Sakana AI’s new evolutionary algorithm builds highly effective AI fashions with out costly retraining

Software program instructions 40% of cybersecurity budgets as gen AI assaults execute in milliseconds

How Intuit killed the chatbot crutch – and constructed an agentic AI playbook you may copy

TAGGED:computerHallucinationshardwareLessonsProjectRealWorldsidewaysvision
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
6 Weekend Habits That Make Me Really feel Like Me Once more
Lifestyle

6 Weekend Habits That Make Me Really feel Like Me Once more

Editorial Board April 26, 2025
Battery-free dental brace reveals essential well being information through smartphone
Males usually wrestle with transition to fatherhood amid lack of focused data and help, evaluation suggests
Monopoly Go maker Scopely to accumulate Niantic’s video games enterprise for $3.5B
Why Europe Is Investing Heavily in Trains

You Might Also Like

Neglect information labeling: Tencent’s R-Zero exhibits how LLMs can practice themselves
Technology

Neglect information labeling: Tencent’s R-Zero exhibits how LLMs can practice themselves

August 29, 2025
From hallucinations to {hardware}: Classes from a real-world laptop imaginative and prescient mission gone sideways
Technology

Nvidia’s $46.7B Q2 proves the platform, however its subsequent battle is ASIC economics on inference

August 29, 2025
In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption
Technology

In crowded voice AI market, OpenAI bets on instruction-following and expressive speech to win enterprise adoption

August 29, 2025
Nous Analysis drops Hermes 4 AI fashions that outperform ChatGPT with out content material restrictions
Technology

Nous Analysis drops Hermes 4 AI fashions that outperform ChatGPT with out content material restrictions

August 29, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • Art
  • World

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?