We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Technology

Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings

Last updated: February 27, 2025 3:07 pm
Editorial Board Published February 27, 2025
Share
SHARE

New York Metropolis startup Hume AI emerged from stealth two years in the past and has since raised multimillions in funding on the idea of its expertise that creatives emotive AI voices to be used in enterprise purposes.

In the present day, it’s taking its choices a step additional with a brand new large-language and speech mannequin known as the “Omni-capable text and voice engine,” or Octave for brief, designed to supply lifelike, emotionally nuanced speech to be used throughout totally different types of content material, from audiobooks to prerecorded online game character dialog and movie/TV/video.

Hume claims Octave is the primary text-to-speech system powered by a big language mannequin (LLM) skilled not solely on textual content however on speech and emotion tokens, enabling it to know phrases in context and regulate tone, rhythm and cadence accordingly — and which the consumer can regulate on the sentence stage with textual content prompts.

“We’re launching the first LLM for text-to-speech — a model that understands words in context, predicting the right emotions, rhythm, cadence and emphasis, making speech sound more human than ever before,” stated Alan Cowen, Hume AI’s cofounder and CEO, in a video name interview with VentureBeat.

Octave’s capabilities transcend primary voice technology. It may well interpret character traits and elegance from a script alone, adjusting vocal inflections to match implied feelings. A sarcastic comment shall be spoken sarcastically, a panicked sentence will sound pressing, and a whispered secret shall be hushed — all without having express course.

As well as, if the consumer doesn’t just like the generated voice or needs to regulate it, they will achieve this granularly by means of pure language by merely typing in a textual content instruction to Octave, akin to “happier, sadder, more frustrated, angrier, more sarcastic, more sincere,” and so on.

“You can describe a character — like a sarcastic medieval peasant — and the model will instantly create that voice, adjusting emotions like anger, sadness or happiness based on your instructions,” Cowen added. “Voice modulation works at the sentence level, but you can also adjust parts of a sentence, instructing the model to convey nuanced emotions like slight frustration mixed with humor or exasperation.”

The mannequin additionally considers context past particular person sentences. “Unlike traditional models that process text word by word, our model considers entire paragraphs, capturing context to deliver more natural and emotionally accurate speech,” he defined.

Whereas the present launch focuses on English-language speech, Octave additionally helps Spanish and is predicted to increase its language capabilities within the close to future.

Tailor-made for content material creation

Octave is tailor-made for content material creators and media manufacturing, providing a variety of purposes.

“This new model is designed for offline text-to-speech — perfect for audiobooks, podcasts, video voiceovers, and video game characters — where creators need realistic, character-specific voices,” Cowen defined.

Nonetheless, the consumer should entry it by means of Hume’s web site both on its Initiatives web page or by means of an software programming interface (API). The “offline” part refers to the truth that this mannequin is designed to supply discrete audio information that may be added to initiatives akin to movies or audiobooks. It’s not designed to hold on real-time dialog, although that would theoretically be allowed by piping in textual content queries to the web site.

Hume’s API permits builders to make as much as 50 requests of the brand new Octave mannequin per minute, with a most textual content size of 5,000 characters and descriptions capped at 1,000 characters. Every request can generate as much as 5 outputs, and the supported audio codecs embody MP3, WAV and PCM.

Hume’s prior EVI collection of fashions permits for streaming, real-time, back-and-forth interactions. They continue to be accessible and can proceed to be developed.

Hume AI provides a subscription-based pricing mannequin with tiers starting from a free choice to Creator, Creator Professional, and Enterprise plans.

Right here’s a concise breakdown of the choices:

Free ($0/month) – 10,000 characters of text-to-speech monthly (~10 minutes) with limitless customized voices

Starter ($3/month) – 30,000 characters (~half-hour) plus help for as much as 20 initiatives

Creator ($10/month) – 100,000 characters (~100 minutes), usage-based pricing for additional characters ($0.20/1,000), and help for as much as 1,000 initiatives

Professional ($50/month) – 500,000 characters (~500 minutes), decrease usage-based pricing ($0.15/1,000), and help for as much as 3,000 initiatives

Scale ($150/month) – 2,000,000 characters (~2,000 minutes), additional diminished usage-based pricing ($0.13/1,000), and help for as much as 10,000 initiatives

Enterprise ($900/month) – 10,000,000 characters (~10,000 minutes), even decrease usage-based pricing ($0.10/1,000), and help for as much as 20,000 initiatives

Enterprise (Customized worth) – Limitless utilization, customized authorized phrases, safety assurances, considerably discounted bulk pricing, and precedence help

Altogether, Hume emphasised that its Octave TTS pricing is round half the price of the competing service from AI voice creation startup ElevenLabs, exhibiting the intensifying competitors within the text-to-speech house.

As well as, Hume AI performed a blind comparability research with 180 human raters to benchmark Octave towards ElevenLabs. The outcomes confirmed that Octave was most popular by way of audio high quality (71.6% of trials), naturalness (51.7% of trials), and the way properly the speech matched descriptions of the specified voice (57.7% of trials), throughout 120 numerous prompts.

To additional consider its efficiency, Hume AI has additionally launched the Expressive TTS Area, a public benchmark designed to check how properly AI fashions deal with longer, expressive speech — an space that earlier TTS benchmarks have largely ignored.

Tens of trillions of language tokens

Not like conventional text-to-speech programs that depend on restricted speech datasets, Octave TTS is constructed on an LLM skilled on tens of trillions of language tokens.

“Traditional text-to-speech models are trained on limited speech data, but ours is built on an LLM trained on tens of trillions of tokens, enabling it to reason, think, and infer emotions from text,” Cowen stated.

The mannequin was skilled utilizing tens of millions of hours of public, long-form speech knowledge and Hume AI’s proprietary datasets of recent voices recored by survey contributors.

“We collected data from people recording themselves through webcams, reacting naturally to videos, telling stories, and talking to others, including friends and family, to capture a wide range of emotional expressions,” Cowen stated.

This in depth coaching permits the mannequin to deduce emotional context and comply with detailed directions, creating voices that match particular character descriptions and attributes.

Constant character voices and limitations

Octave TTS maintains constant character voices throughout long-form content material.

“With our platform, you can generate unique voices for each character in an audiobook — like a middle-aged orc — and maintain that character’s voice throughout the story,” Cowen stated.

This functionality is supported by Hume AI’s “Projects” web page, which handles long-form content material like audiobooks by routinely chunking textual content whereas preserving character consistency and context throughout chapters.

Hume has technical guardrails constructed into its web site and API prohibiting sure makes use of, however apart from that, it’s open to make use of throughout a variety of content material and topics, together with doubtlessly not-safe-for-work scenes akin to these in standard romance novels.

“We give developers freedom, allowing content across a broad range of human experiences, though we restrict the creation of realistic children’s voices and imitations of specific individuals,” Cowen defined.

As well as, Cowen stated that the corporate might regulate these guardrails for particular shoppers upon request, akin to a kids’s-book writer trying to create voices for kids’s audiobooks.

Hume AI is engaged on a forthcoming Voice Cloning characteristic, which is able to permit customers to duplicate a voice from as little as 5 seconds of audio. The corporate is growing safeguards to make sure moral use earlier than rolling out the characteristic publicly.

With its mixture of contextual consciousness, emotional expression and character customization, Octave TTS goals to supply content material creators with extra management and suppleness, delivering voices that sound each life like and emotionally participating.

Day by day insights on enterprise use instances with VB Day by day

If you wish to impress your boss, VB Day by day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

An error occured.

The  Billion database wager: What Databricks’ Neon acquisition means on your AI technique

You Might Also Like

Google’s AlphaEvolve: The AI agent that reclaimed 0.7% of Google’s compute – and the way to copy it

Shrink exploit home windows, slash MTTP: Why ring deployment is now a should for enterprise protection

Shrink exploit home windows, slash MTTP: Why ring deployment is now a should for enterprise protection

TLI Ranked Highest-Rated 3PL on Google Reviews

Sandsoft’s David Fernandez Remesal on the Apple antitrust ruling and extra cell recreation alternatives | The DeanBeat

TAGGED:adjustablecustomEmotionsgeneratesHumelaunchesmodelOctavetexttospeechvoices
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Exploring the Impact of Boardsi’s New Board Suite Through the Eyes of CEO Martin Rowinski
BusinessTrending

Exploring the Impact of Boardsi’s New Board Suite Through the Eyes of CEO Martin Rowinski

Editorial Board May 14, 2025
Oprah units report straight on rumored $1 million cost to endorse Harris marketing campaign
John Humble, Photographer Who Captured LA’s Contradictions, Dies at 81
Autism and neural dynamic vary: Insights into slower, extra detailed processing
Utilizing AI to foretell the result of aggressive pores and skin cancers

You Might Also Like

OpenAI launches analysis preview of Codex AI software program engineering agent for builders — with parallel tasking
Technology

OpenAI launches analysis preview of Codex AI software program engineering agent for builders — with parallel tasking

May 16, 2025
Acer unveils AI-powered wearables at Computex 2025
Technology

Acer unveils AI-powered wearables at Computex 2025

May 16, 2025
Elon Musk’s xAI tries to elucidate Grok’s South African race relations freakout the opposite day
Technology

Elon Musk’s xAI tries to elucidate Grok’s South African race relations freakout the opposite day

May 16, 2025
The  Billion database wager: What Databricks’ Neon acquisition means on your AI technique
Technology

The $1 Billion database wager: What Databricks’ Neon acquisition means on your AI technique

May 16, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • World
  • Art

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?