We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: A brand new, open supply text-to-speech mannequin referred to as Dia has arrived to problem ElevenLabs, OpenAI and extra
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > A brand new, open supply text-to-speech mannequin referred to as Dia has arrived to problem ElevenLabs, OpenAI and extra
A brand new, open supply text-to-speech mannequin referred to as Dia has arrived to problem ElevenLabs, OpenAI and extra
Technology

A brand new, open supply text-to-speech mannequin referred to as Dia has arrived to problem ElevenLabs, OpenAI and extra

Last updated: April 22, 2025 9:46 pm
Editorial Board Published April 22, 2025
Share
SHARE

A two-person startup by the title of Nari Labs has launched Dia, a 1.6 billion parameter text-to-speech (TTS) mannequin designed to supply naturalistic dialogue immediately from textual content prompts — and considered one of its creators claims it surpasses the efficiency of competing proprietary choices from the likes of ElevenLabs, Google’s hit NotebookLM AI podcast technology product.

It may additionally threaten uptake of OpenAI’s latest gpt-4o-mini-tts.

“Dia rivals NotebookLM’s podcast feature while surpassing ElevenLabs Studio and Sesame’s open model in quality,” mentioned Toby Kim, one of many co-creators of Nari and Dia, on a publish from his account on the social community X.

In a separate publish, Kim famous that the mannequin was constructed with “zero funding,” and added throughout a thread: “…we were not AI experts from the beginning. It all started when we fell in love with NotebookLM’s podcast feature when it was released last year. We wanted more—more control over the voices, more freedom in the script. We tried every TTS API on the market. None of them sounded like real human conversation.”

Kim additional credited Google for giving him and his collaborator entry to the corporate’s Tensor Processing Unit chips (TPUs) for coaching Dia by Google’s Analysis Cloud.

Dia’s code and weights — the interior mannequin connection set — is now out there for obtain and native deployment by anybody from Hugging Face or Github. Particular person customers can attempt producing speech from it on a Hugging Face House.

Superior controls and extra customizable options

Dia helps nuanced options like emotional tone, speaker tagging, and nonverbal audio cues—all from plain textual content.

Customers can mark speaker turns with tags like [S1] and [S2], and embody cues like (laughs), (coughs), or (clears throat) to complement the ensuing dialogue with nonverbal behaviors.

These tags are appropriately interpreted by Dia throughout technology—one thing not reliably supported by different out there fashions, in line with the corporate’s examples web page.

The mannequin is at the moment English-only and never tied to any single speaker’s voice, producing totally different voices per run until customers repair the technology seed or present an audio immediate. Audio conditioning, or voice cloning, lets customers information speech tone and voice likeness by importing a pattern clip.

Nari Labs gives instance code to facilitate this course of and a Gradio-based demo so customers can attempt it with out setup.

Comparability with ElevenLabs and Sesame

Nari gives a number of instance audio information generated by Dia on its Notion web site, evaluating it to different main speech-to-text rivals, particularly ElevenLabs Studio and Sesame CSM-1B, the latter a brand new text-to-speech mannequin from Oculus VR headset co-creator Brendan Iribe that went considerably viral on X earlier this 12 months.

Aspect-by-side examples shared by Nari Labs present how Dia outperforms the competitors in a number of areas:

In customary dialogue eventualities, Dia handles each pure timing and nonverbal expressions higher. For instance, in a script ending with (laughs), Dia interprets and delivers precise laughter, whereas ElevenLabs and Sesame output textual substitutions like “haha”.

For instance, right here’s Dia…

…and the identical sentence spoken by ElevenLabs Studio

In multi-turn conversations with emotional vary, Dia demonstrates smoother transitions and tone shifts. One check included a dramatic, emotionally-charged emergency scene. Dia rendered the urgency and speaker stress successfully, whereas competing fashions usually flattened supply or misplaced pacing.

Dia uniquely handles nonverbal-only scripts, resembling a humorous alternate involving coughs, sniffs, and laughs. Competing fashions failed to acknowledge these tags or skipped them completely.

Even with rhythmically advanced content material like rap lyrics, Dia generates fluid, performance-style speech that maintains tempo. This contrasts with extra monotone or disjointed outputs from ElevenLabs and Sesame’s 1B mannequin.

Utilizing audio prompts, Dia can lengthen or proceed a speaker’s voice model into new traces. An instance utilizing a conversational clip as a seed confirmed how Dia carried vocal traits from the pattern by the remainder of the scripted dialogue. This function isn’t robustly supported in different fashions.

In a single set of checks, Nari Labs famous that Sesame’s greatest web site demo possible used an inner 8B model of the mannequin relatively than the general public 1B checkpoint, leading to a niche between marketed and precise efficiency.

Mannequin entry and tech specs

Builders can entry Dia from Nari Labs’ GitHub repository and its Hugging Face mannequin web page.

The mannequin runs on PyTorch 2.0+ and CUDA 12.6 and requires about 10GB of VRAM.

Inference on enterprise-grade GPUs just like the NVIDIA A4000 delivers roughly 40 tokens per second.

Whereas the present model solely runs on GPU, Nari plans to supply CPU help and a quantized launch to enhance accessibility.

The startup gives each a Python library and CLI software to additional streamline deployment.

Dia’s flexibility opens use circumstances from content material creation to assistive applied sciences and artificial voiceovers.

Absolutely open supply

The mannequin is distributed below a completely open supply Apache 2.0 license, which implies it may be used for business functions — one thing that may clearly enchantment to enterprises or indie app builders.

Nari Labs explicitly prohibits utilization that features impersonating people, spreading misinformation, or participating in unlawful actions. The crew encourages accountable experimentation and has taken a stance in opposition to unethical deployment.

Dia’s improvement credit help from the Google TPU Analysis Cloud, Hugging Face’s ZeroGPU grant program, and prior work on SoundStorm, Parakeet, and Descript Audio Codec.

Nari Labs itself includes simply two engineers—one full-time and one part-time—however they actively invite neighborhood contributions by its Discord server and GitHub.

With a transparent give attention to expressive high quality, reproducibility, and open entry, Dia provides a particular new voice to the panorama of generative speech fashions.

Every day insights on enterprise use circumstances with VB Every day

If you wish to impress your boss, VB Every day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you may share insights for max ROI.

An error occured.

You Might Also Like

Shrink exploit home windows, slash MTTP: Why ring deployment is now a should for enterprise protection

Shrink exploit home windows, slash MTTP: Why ring deployment is now a should for enterprise protection

TLI Ranked Highest-Rated 3PL on Google Reviews

Sandsoft’s David Fernandez Remesal on the Apple antitrust ruling and extra cell recreation alternatives | The DeanBeat

OpenAI launches analysis preview of Codex AI software program engineering agent for builders — with parallel tasking

TAGGED:arrivedcalledChallengeDiaElevenLabsmodelopenOpenAIsourcetexttospeech
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Jets staying motivated in last 4 video games regardless of lacking playoffs for 14th consecutive yr
Sports

Jets staying motivated in last 4 video games regardless of lacking playoffs for 14th consecutive yr

Editorial Board December 9, 2024
What Season Are You? Every little thing You Have to Know About Colour Evaluation
Spotify Wrapped 2024 is right here. It was a great 12 months for música Mexicana
Apple’s court docket loss to Epic Video games is a surprising turnaround | The DeanBeat
Kidney most cancers research identifies elements for distinctive response to immunotherapy

You Might Also Like

Acer unveils AI-powered wearables at Computex 2025
Technology

Acer unveils AI-powered wearables at Computex 2025

May 16, 2025
Elon Musk’s xAI tries to elucidate Grok’s South African race relations freakout the opposite day
Technology

Elon Musk’s xAI tries to elucidate Grok’s South African race relations freakout the opposite day

May 16, 2025
A brand new, open supply text-to-speech mannequin referred to as Dia has arrived to problem ElevenLabs, OpenAI and extra
Technology

The $1 Billion database wager: What Databricks’ Neon acquisition means on your AI technique

May 16, 2025
Software program engineering-native AI fashions have arrived: What Windsurf’s SWE-1 means for technical decision-makers
Technology

Software program engineering-native AI fashions have arrived: What Windsurf’s SWE-1 means for technical decision-makers

May 16, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • World
  • Art

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?