We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine
Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine
Technology

Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine

Last updated: January 23, 2025 6:29 pm
Editorial Board Published January 23, 2025
Share
SHARE

DeepSeek’s launch of R1 this week was a watershed second within the area of AI. No one thought a Chinese language startup can be the primary to drop a reasoning mannequin matching OpenAI’s o1 and open-source it (consistent with OpenAI’s unique mission) on the identical time.

Enterprises can simply obtain R1’s weights by way of Hugging Face, however entry has by no means been the issue — over 80% of groups are utilizing or planning to make use of open fashions. Deployment is the actual perpetrator. In the event you go together with hyperscaler companies, like Vertex AI, you’re locked into a particular cloud. Then again, when you go solo and construct in-house, there’s the problem of useful resource constraints as it’s important to arrange a dozen totally different parts simply to get began, not to mention optimizing or scaling downstream.

To handle this problem, Y Combinator and SenseAI-backed Pipeshift is launching an end-to-end platform that permits enterprises to coach, deploy and scale open-source generative AI fashions — LLMs, imaginative and prescient fashions, audio fashions and picture fashions — throughout any cloud or on-prem GPUs. The corporate is competing with a quickly rising area that features Baseten, Domino Information Lab, Collectively AI and Simplismart.

The important thing worth proposition? Pipeshift makes use of a modular inference engine that may shortly be optimized for velocity and effectivity, serving to groups not solely deploy 30 instances sooner however obtain extra with the identical infrastructure, resulting in as a lot as 60% price financial savings. 

Think about working inferences price 4 GPUs with only one.

The orchestration bottleneck

When it’s important to run totally different fashions, stitching collectively a useful MLOps stack in-house — from accessing compute, coaching and fine-tuning to production-grade deployment and monitoring — turns into the issue. It’s a must to arrange 10 totally different inference parts and situations to get issues up and working after which put in hundreds of engineering hours for even the smallest of optimizations. 

“There are multiple components of an inference engine,” Arko Chattopadhyay, cofounder and CEO of Pipeshift, informed VentureBeat. “Every combination of these components creates a distinct engine with varying performance for the same workload. Identifying the optimal combination to maximize ROI requires weeks of repetitive experimentation and fine-tuning of settings. In most cases, the in-house teams can take years to develop pipelines that can allow for the flexibility and modularization of infrastructure, pushing enterprises behind in the market alongside accumulating massive tech debts.”

Whereas there are startups that supply platforms to deploy open fashions throughout cloud or on-premise environments, Chattopadhyay says most of them are GPU brokers, providing one-size-fits-all inference options. Because of this, they keep separate GPU situations for various LLMs, which doesn’t assist when groups wish to save prices and optimize for efficiency.

To repair this, Chattopadhyay began Pipeshift and developed a framework known as modular structure for GPU-based inference clusters (MAGIC), geared toward distributing the inference stack into totally different plug-and-play items. The work created a Lego-like system that permits groups to configure the proper inference stack for his or her workloads, with out the effort of infrastructure engineering.

This manner, a group can shortly add or interchange totally different inference parts to piece collectively a personalized inference engine that may extract extra out of present infrastructure to fulfill expectations for prices, throughput and even scalability. 

As an example, a group may arrange a unified inference system, the place a number of domain-specific LLMs may run with hot-swapping on a single GPU, using it to full profit.

Working 4 GPU workloads on one

Since claiming to supply a modular inference resolution is one factor and delivering on it’s solely one other, Pipeshift’s founder was fast to level out the advantages of the corporate’s providing. 

“In terms of operational expenses…MAGIC allows you to run LLMs like Llama 3.1 8B at >500 tokens/sec on a given set of Nvidia GPUs without any model quantization or compression,” he stated. “This unlocks a massive reduction of scaling costs as the GPUs can now handle workloads that are an order of magnitude 20-30 times what they originally were able to achieve using the native platforms offered by the cloud providers.”

The CEO famous that the corporate is already working with 30 firms on an annual license-based mannequin. 

One among these is a Fortune 500 retailer that originally used 4 unbiased GPU situations to run 4 open fine-tuned fashions for his or her automated help and doc processing workflows. Every of those GPU clusters was scaling independently, including to large price overheads.

“Large-scale fine-tuning was not possible as datasets became larger and all the pipelines were supporting single-GPU workloads while requiring you to upload all the data at once. Plus, there was no auto-scaling support with tools like AWS Sagemaker, which made it hard to ensure optimal use of infra, pushing the company to pre-approve quotas and reserve capacity beforehand for theoretical scale that only hit 5% of the time,” Chattopadhyay famous.

Apparently, after shifting to Pipeshift’s modular structure, all of the fine-tunes had been introduced all the way down to a single GPU occasion that served them in parallel, with none reminiscence partitioning or mannequin degradation. This introduced down the requirement to run these workloads from 4 GPUs to only a single GPU.

“Without additional optimizations, we were able to scale the capabilities of the GPU to a point where it was serving five-times-faster tokens for inference and could handle a four-times-higher scale,” the CEO added. In all, he stated that the corporate noticed a 30-times sooner deployment timeline and a 60% discount in infrastructure prices.

With modular structure, Pipeshift desires to place itself because the go-to platform for deploying all cutting-edge open-source AI fashions, together with DeepSeek R-1.

Nevertheless, it gained’t be a straightforward experience as opponents proceed to evolve their choices.

As an example, Simplismart, which raised $7 million a number of months in the past, is taking the same software-optimized method to inference. Cloud service suppliers like Google Cloud and Microsoft Azure are additionally bolstering their respective choices, though Chattopadhyay thinks these CSPs can be extra like companions than opponents in the long term.

“We are a platform for tooling and orchestration of AI workloads, like Databricks has been for data intelligence,” he defined. “In most scenarios, most cloud service providers will turn into growth-stage GTM partners for the kind of value their customers will be able to derive from Pipeshift on their AWS/GCP/Azure clouds.” 

Within the coming months, Pipeshift can even introduce instruments to assist groups construct and scale their datasets, alongside mannequin analysis and testing. It will velocity up the experimentation and information preparation cycle exponentially, enabling clients to leverage orchestration extra effectively. 

Every day insights on enterprise use instances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

An error occured.

You Might Also Like

Mistplay affords reward-based person acquisition on the iPhone

Sport of Thrones: Kingsroad launches on cellular and PC

Logitech launches G522 gaming headset for private expression

At Google I/O, Sergey Brin makes shock look — and declares Google will construct the primary AGI

OpenAI updates its new Responses API quickly with MCP assist, GPT-4o native picture gen, and extra enterprise options

TAGGED:cutsengineGPUinferencesinterfacemodularPipeshiftusage
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Russia Releases Trevor Reed, Former U.S. Marine, in Prisoner Swap
Trending

Russia Releases Trevor Reed, Former U.S. Marine, in Prisoner Swap

Editorial Board April 28, 2022
What Happened to The Believer?
NYC Councilman Yusef Salaam acquired Mayor Adams’ first donation — then endorsed Adrienne Adams
80 Years Ago the Nazis Planned the ‘Final Solution.’ It Took 90 Minutes.
Nervousness issues in kids will be screened by way of faculty well being care

You Might Also Like

Mistral AI launches Devstral, highly effective new open supply SWE agent mannequin that runs on laptops
Technology

Mistral AI launches Devstral, highly effective new open supply SWE agent mannequin that runs on laptops

May 21, 2025
AMD unveils new Threadripper CPUs and Radeon GPUs for players at Computex 2025
Technology

AMD unveils new Threadripper CPUs and Radeon GPUs for players at Computex 2025

May 21, 2025
Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine
Technology

Google simply leapfrogged each competitor with mind-blowing AI that may suppose deeper, store smarter, and create movies with dialogue

May 21, 2025
Pipeshift cuts GPU utilization for AI inferences 75% with modular interface engine
Technology

Google’s Jules goals to out-code Codex in battle for the AI developer stack

May 21, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • World
  • Art

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?