As enterprises more and more look to construct and deploy generative AI-powered purposes and companies for inside or exterior use (workers or prospects), one of many hardest questions they face is knowing precisely how effectively these AI instruments are performing out within the wild.
In actual fact, a current survey by consulting agency McKinsey and Firm discovered that solely 27% of 830 respondents mentioned that their enterprises’ reviewed all the outputs of their generative AI methods earlier than they went out to customers.
Until a person really writes in with a criticism report, how is an organization to know if its AI product is behaving as anticipated and deliberate?
Raindrop, previously often called Daybreak AI, is a brand new startup tackling the problem head-on, positioning itself as the primary observability platform purpose-built for AI in manufacturing, catching errors as they occur and explaining to enterprises what went mistaken and why. The purpose? Assist clear up generative AI’s so-called “black box problem.”
“AI products fail constantly—in ways both hilarious and terrifying,” wrote co-founder Ben Hylak on X not too long ago, “Regular software throws exceptions. But AI products fail silently.”
Raindrop seeks to supply any category-defining software akin to what observability firm Sentry does for conventional software program.
However whereas conventional exception monitoring instruments don’t seize the nuanced misbehaviors of enormous language fashions or AI companions, Raindrop makes an attempt to fill the outlet.
“In traditional software, you have tools like Sentry and Datadog to tell you what’s going wrong in production,” he informed VentureBeat in a video name interview final week. “With AI, there was nothing.”
Till now — after all.
How Raindrop works
Raindrop provides a set of instruments that enable groups at enterprises massive and small to detect, analyze, and reply to AI points in actual time.
The platform sits on the intersection of person interactions and mannequin outputs, analyzing patterns throughout a whole bunch of tens of millions of day by day occasions, however doing so with SOC-2 encryption enabled, defending the information and privateness of customers and the corporate providing the AI answer.
“Raindrop sits where the user is,” Hylak defined. “We analyze their messages, plus signals like thumbs up/down, build errors, or whether they deployed the output, to infer what’s actually going wrong.”
Raindrop makes use of a machine studying pipeline that mixes LLM-powered summarization with smaller bespoke classifiers optimized for scale.
Promotional screenshot of Raindrop’s dashboard. Credit score: Raindrop.ai
“Our ML pipeline is one of the most complex I’ve seen,” Hylak mentioned. “We use large LLMs for early processing, then train small, efficient models to run at scale on hundreds of millions of events daily.”
Prospects can observe indicators like person frustration, activity failures, refusals, and reminiscence lapses. Raindrop makes use of suggestions indicators resembling thumbs down, person corrections, or follow-up habits (like failed deployments) to establish points.
Fellow Raindrop co-founder and CEO Zubin Singh Koticha informed VentureBeat in the identical interview that whereas many enterprises relied on evaluations, benchmarks, and unit assessments for checking the reliability of their AI options, there was little or no designed to test AI outputs throughout manufacturing.
“Imagine in traditional coding if you’re like, ‘Oh, my software passes ten unit tests. It’s great. It’s a robust piece of software.’ That’s obviously not how it works,” Koticha mentioned. “It’s a similar problem we’re trying to solve here, where in production, there isn’t actually a lot that tells you: is it working extremely well? Is it broken or not? And that’s where we fit in.”
For enterprises in extremely regulated industries or for these looking for extra ranges of privateness and management, Raindrop provides Notify, a completely on-premises, privacy-first model of the platform geared toward enterprises with strict information dealing with necessities.
In contrast to conventional LLM logging instruments, Notify performs redaction each client-side by way of SDKs and server-side with semantic instruments. It shops no persistent information and retains all processing inside the buyer’s infrastructure.
Raindrop Notify gives day by day utilization summaries and surfacing of high-signal points immediately inside office instruments like Slack and Groups—with out the necessity for cloud logging or advanced DevOps setups.
Superior error identification and precision
Figuring out errors, particularly with AI fashions, is way from simple.
“What’s hard in this space is that every AI application is different,” mentioned Hylak. “One customer might build a spreadsheet tool, another an alien companion. What ‘broken’ looks like varies wildly between them.” That variability is why Raindrop’s system adapts to every product individually.
Every AI product Raindrop screens is handled as distinctive. The platform learns the form of the information and habits norms for every deployment, then builds a dynamic challenge ontology that evolves over time.
“Raindrop learns the data patterns of each product,” Hylak defined. “It starts with a high-level ontology of common AI issues—things like laziness, memory lapses, or user frustration—and then adapts those to each app.”
Whether or not it’s a coding assistant that forgets a variable, an AI alien companion that out of the blue refers to itself as a human from the U.S., or perhaps a chatbot that begins randomly citing claims of “white genocide” in South Africa, Raindrop goals to floor these points with actionable context.
The notifications are designed to be light-weight and well timed. Groups obtain Slack or Microsoft Groups alerts when one thing uncommon is detected, full with options on reproduce the issue.
Over time, this enables AI builders to repair bugs, refine prompts, and even establish systemic flaws in how their purposes reply to customers.
“We classify millions of messages a day to find issues like broken uploads or user complaints,” mentioned Hylak. “It’s all about surfacing patterns strong and specific enough to warrant a notification.”
From Sidekick to Raindrop
The corporate’s origin story is rooted in hands-on expertise. Hylak, who beforehand labored as a human interface designer at visionOS at Apple and avionics software program engineering at SpaceX, started exploring AI after encountering GPT-3 in its early days again in 2020.
“As soon as I used GPT-3—just a simple text completion—it blew my mind,” he recalled. “I instantly thought, ‘This is going to change how people interact with technology.’”
Alongside fellow co-founders Koticha and Alexis Gauba, Hylak initially constructed Sidekick, a VS Code extension with a whole bunch of paying customers.
However constructing Sidekick revealed a deeper downside: debugging AI merchandise in manufacturing was almost unattainable with the instruments obtainable.
“We started by building AI products, not infrastructure,” Hylak defined. “But pretty quickly, we saw that to grow anything serious, we needed tooling to understand AI behavior—and that tooling didn’t exist.”
What began as an annoyance shortly advanced into the core focus. The workforce pivoted, constructing out instruments to make sense of AI product habits in real-world settings.
Within the course of, they found they weren’t alone. Many AI-native firms lacked visibility into what their customers had been really experiencing and why issues had been breaking. With that, Raindrop was born.
Raindrop’s pricing, differentiation and adaptability have attracted a variety of preliminary prospects
Raindrop’s pricing is designed to accommodate groups of varied sizes.
A Starter plan is accessible at $65/month, with metered utilization pricing. The Professional tier, which incorporates customized matter monitoring, semantic search, and on-prem options, begins at $350/month and requires direct engagement.
Whereas observability instruments usually are not new, most present choices had been constructed earlier than the rise of generative AI.
Raindrop units itself aside by being AI-native from the bottom up. “Raindrop is AI-native,” Hylak mentioned. “Most observability tools were built for traditional software. They weren’t designed to handle the unpredictability and nuance of LLM behavior in the wild.”
This specificity has attracted a rising set of shoppers, together with groups at Clay.com, Tolen, and New Laptop.
Raindrop’s prospects span a variety of AI verticals—from code era instruments to immersive AI storytelling companions—every requiring totally different lenses on what “misbehavior” seems to be like.
Born from necessity
Raindrop’s rise illustrates how the instruments for constructing AI must evolve alongside the fashions themselves. As firms ship extra AI-powered options, observability turns into important—not simply to measure efficiency, however to detect hidden failures earlier than customers escalate them.
In Hylak’s phrases, Raindrop is doing for AI what Sentry did for net apps—besides the stakes now embody hallucinations, refusals, and misaligned intent. With its rebrand and product enlargement, Raindrop is betting that the subsequent era of software program observability might be AI-first by design.
Every day insights on enterprise use circumstances with VB Every day
If you wish to impress your boss, VB Every day has you coated. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for optimum ROI.
An error occured.


