As demand for large-scale AI deployment skyrockets, the lesser-known, non-public chip startup Positron is positioning itself as a direct challenger to market chief Nvidia by providing devoted, energy-efficient, memory-optimized inference chips geared toward relieving the business’s mounting price, energy, and availability bottlenecks.
“A key differentiator is our ability to run frontier AI models with better efficiency—achieving 2x to 5x performance per watt and dollar compared to Nvidia,” mentioned Thomas Sohmers, Positron co-founder and CTO, in a latest video name interview with VentureBeat.
“We build chips that can be deployed in hundreds of existing data centers because they don’t require liquid cooling or extreme power densities,” identified Mitesh Agrawal, Positron’s CEO and the previous chief working officer of AI cloud inference supplier Lambda, additionally in the identical video name interview with VentureBeat.
The AI Influence Sequence Returns to San Francisco – August 5
The following part of AI is right here – are you prepared? Be part of leaders from Block, GSK, and SAP for an unique have a look at how autonomous brokers are reshaping enterprise workflows – from real-time decision-making to end-to-end automation.
Safe your spot now – area is proscribed: https://bit.ly/3GuuPLF
Enterprise capitalists and early customers appear to agree.
Positron yesterday introduced an oversubscribed $51.6 million Sequence A funding spherical led by Valor Fairness Companions, Atreides Administration and DFJ Progress, with assist from Flume Ventures, Resilience Reserve, 1517 Fund and Until.
As for Positron’s early buyer base, that features each name-brand enterprises and firms working in inference-heavy sectors. Confirmed deployments embody the foremost safety and cloud content material networking supplier Cloudflare, which makes use of Positron’s Atlas {hardware} in its globally distributed, power-constrained information facilities, and Parasail, through its AI-native information infrastructure platform SnapServe.
Past these, Positron experiences adoption throughout a number of key verticals the place environment friendly inference is essential, comparable to networking, gaming, content material moderation, content material supply networks (CDNs), and Token-as-a-Service suppliers.
These early customers are reportedly drawn in by Atlas’s capacity to ship excessive throughput and decrease energy consumption with out requiring specialised cooling or reworked infrastructure, making it a lovely drop-in choice for AI workloads throughout enterprise environments.
Getting into a difficult market that’s reducing AI mannequin dimension and growing effectivity
However Positron can also be coming into a difficult market. The Data simply reported that rival buzzy AI inference chip startup Groq — the place Sohmers beforehand labored as Director of Know-how Technique — has decreased its 2025 income projection from $2 billion+ to $500 million, highlighting simply how unstable the AI {hardware} area could be.
Even well-funded companies face headwinds as they compete for information heart capability and enterprise mindshare towards entrenched GPU suppliers like Nvidia, to not point out the elephant within the room: the rise of extra environment friendly, smaller giant language fashions (LLMs) and specialised small language fashions (SLMs) that may even run on gadgets as small and low-powered as smartphones.
But Positron’s management is for now embracing the development and shrugging off the doable impacts on its progress trajectory.
“There’s always been this duality—lightweight applications on local devices and heavyweight processing in centralized infrastructure,” mentioned Agrawal. “We believe both will keep growing.”
Sohmers agreed, stating: “We see a future where every person might have a capable model on their phone, but those will still rely on large models in data centers to generate deeper insights.”
Atlas is an inference-first AI chip
Whereas Nvidia GPUs helped catalyze the deep studying increase by accelerating mannequin coaching, Positron argues that inference — the stage the place fashions generate output in manufacturing — is now the true bottleneck.
Its founders name it essentially the most under-optimized a part of the “AI stack,” particularly for generative AI workloads that rely upon quick, environment friendly mannequin serving.
Positron’s answer is Atlas, its first-generation inference accelerator constructed particularly to deal with giant transformer fashions.
In contrast to general-purpose GPUs, Atlas is optimized for the distinctive reminiscence and throughput wants of contemporary inference duties.
The corporate claims Atlas delivers 3.5x higher efficiency per greenback and as much as 66% decrease energy utilization than Nvidia’s H100, whereas additionally attaining 93% reminiscence bandwidth utilization—far above the everyday 10–30% vary seen in GPUs.
From Atlas to Titan, supporting multi-trillion parameter fashions
Launched simply 15 months after founding — and with solely $12.5 million in seed capital — Atlas is already delivery and in manufacturing.
The system helps as much as 0.5 trillion-parameter fashions in a single 2kW server and is appropriate with Hugging Face transformer fashions through an OpenAI API-compatible endpoint.
Positron is now making ready to launch its next-generation platform, Titan, in 2026.
Constructed on custom-designed “Asimov” silicon, Titan will characteristic as much as two terabytes of high-speed reminiscence per accelerator and assist fashions as much as 16 trillion parameters.
In the present day’s frontier fashions are within the hundred billions and single digit trillions of parameters, however newer fashions like OpenAI’s GPT-5 are presumed to be within the multi-trillions, and bigger fashions are at the moment regarded as required to achieve synthetic basic intelligence (AGI), AI that outperforms people on most economically beneficial work, and superintelligence, AI that exceeds the flexibility for people to know and management.
Crucially, Titan is designed to function with commonplace air cooling in standard information heart environments, avoiding the high-density, liquid-cooled configurations that next-gen GPUs more and more require.
Engineering for effectivity and compatibility
From the beginning, Positron designed its system to be a drop-in substitute, permitting clients to make use of current mannequin binaries with out code rewrites.
“If a customer had to change their behavior or their actions in any way, shape or form, that was a barrier,” mentioned Sohmers.
Sohmers defined that as a substitute of constructing a fancy compiler stack or rearchitecting software program ecosystems, Positron targeted narrowly on inference, designing {hardware} that ingests Nvidia-trained fashions immediately.
“CUDA mode isn’t something to fight,” mentioned Agrawal. “It’s an ecosystem to participate in.”
This pragmatic method helped the corporate ship its first product shortly, validate efficiency with actual enterprise customers, and safe important follow-on funding. As well as, its give attention to air cooling versus liquid cooling makes its Atlas chips the one choice for some deployments.
“We’re focused entirely on purely air-cooled deployments… all these Nvidia Hopper- and Blackwell-based solutions going forward are required liquid cooling… The only place you can put those racks are in data centers that are being newly built now in the middle of nowhere,” mentioned Sohmers.
All advised, Positron’s capacity to execute shortly and capital-efficiently has helped distinguish it in a crowded AI {hardware} market.
Reminiscence is what you want
Sohmers and Agrawal level to a elementary shift in AI workloads: from compute-bound convolutional neural networks to memory-bound transformer architectures.
Whereas older fashions demanded excessive FLOPs (floating-point operations), trendy transformers require huge reminiscence capability and bandwidth to run effectively.
Whereas Nvidia and others proceed to give attention to compute scaling, Positron is betting on memory-first design.
Sohmers famous that with transformer inference, the ratio of compute to reminiscence operations flips to close 1:1, which means that boosting reminiscence utilization has a direct and dramatic affect on efficiency and energy effectivity.
With Atlas already outperforming up to date GPUs on key effectivity metrics, Titan goals to take this additional by providing the very best reminiscence capability per chip within the business.
At launch, Titan is anticipated to supply an order-of-magnitude enhance over typical GPU reminiscence configurations — with out demanding specialised cooling or boutique networking setups.
U.S.-built chips
Positron’s manufacturing pipeline is proudly home. The corporate’s first-generation chips had been fabricated within the U.S. utilizing Intel amenities, with remaining server meeting and integration additionally based mostly domestically.
For the Asimov chip, fabrication will shift to TSMC, although the group is aiming to maintain as a lot of the remainder of the manufacturing chain within the U.S. as doable, relying on foundry capability.
Geopolitical resilience and provide chain stability have gotten key buying standards for a lot of clients — another excuse Positron believes its U.S.-made {hardware} provides a compelling different.
What’s subsequent?
Agrawal famous that Positron’s silicon targets not simply broad compatibility however most utility for enterprise, cloud, and analysis labs alike.
Whereas the corporate has not named any frontier mannequin suppliers as clients but, he confirmed that outreach and conversations are underway.
Agrawal emphasised that promoting bodily infrastructure based mostly on economics and efficiency—not bundling it with proprietary APIs or enterprise fashions—is a part of what offers Positron credibility in a skeptical market.
“If you can’t convince a customer to deploy your hardware based on its economics, you’re not going to be profitable,” he mentioned.
Each day insights on enterprise use circumstances with VB Each day
If you wish to impress your boss, VB Each day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you’ll be able to share insights for optimum ROI.
An error occured.

