We collect cookies to analyze our website traffic and performance; we never collect any personal data. Cookie Policy
Accept
NEW YORK DAWN™NEW YORK DAWN™NEW YORK DAWN™
Notification Show More
Font ResizerAa
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Reading: Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Share
Font ResizerAa
NEW YORK DAWN™NEW YORK DAWN™
Search
  • Home
  • Trending
  • New York
  • World
  • Politics
  • Business
    • Business
    • Economy
    • Real Estate
  • Crypto & NFTs
  • Tech
  • Lifestyle
    • Lifestyle
    • Food
    • Travel
    • Fashion
    • Art
  • Health
  • Sports
  • Entertainment
Follow US
NEW YORK DAWN™ > Blog > Technology > Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings
Technology

Hume launches new text-to-speech mannequin Octave that generates customized AI voices with adjustable feelings

Last updated: February 27, 2025 3:07 pm
Editorial Board Published February 27, 2025
Share
SHARE

New York Metropolis startup Hume AI emerged from stealth two years in the past and has since raised multimillions in funding on the idea of its expertise that creatives emotive AI voices to be used in enterprise purposes.

In the present day, it’s taking its choices a step additional with a brand new large-language and speech mannequin known as the “Omni-capable text and voice engine,” or Octave for brief, designed to supply lifelike, emotionally nuanced speech to be used throughout totally different types of content material, from audiobooks to prerecorded online game character dialog and movie/TV/video.

Hume claims Octave is the primary text-to-speech system powered by a big language mannequin (LLM) skilled not solely on textual content however on speech and emotion tokens, enabling it to know phrases in context and regulate tone, rhythm and cadence accordingly — and which the consumer can regulate on the sentence stage with textual content prompts.

“We’re launching the first LLM for text-to-speech — a model that understands words in context, predicting the right emotions, rhythm, cadence and emphasis, making speech sound more human than ever before,” stated Alan Cowen, Hume AI’s cofounder and CEO, in a video name interview with VentureBeat.

Octave’s capabilities transcend primary voice technology. It may well interpret character traits and elegance from a script alone, adjusting vocal inflections to match implied feelings. A sarcastic comment shall be spoken sarcastically, a panicked sentence will sound pressing, and a whispered secret shall be hushed — all without having express course.

As well as, if the consumer doesn’t just like the generated voice or needs to regulate it, they will achieve this granularly by means of pure language by merely typing in a textual content instruction to Octave, akin to “happier, sadder, more frustrated, angrier, more sarcastic, more sincere,” and so on.

“You can describe a character — like a sarcastic medieval peasant — and the model will instantly create that voice, adjusting emotions like anger, sadness or happiness based on your instructions,” Cowen added. “Voice modulation works at the sentence level, but you can also adjust parts of a sentence, instructing the model to convey nuanced emotions like slight frustration mixed with humor or exasperation.”

The mannequin additionally considers context past particular person sentences. “Unlike traditional models that process text word by word, our model considers entire paragraphs, capturing context to deliver more natural and emotionally accurate speech,” he defined.

Whereas the present launch focuses on English-language speech, Octave additionally helps Spanish and is predicted to increase its language capabilities within the close to future.

Tailor-made for content material creation

Octave is tailor-made for content material creators and media manufacturing, providing a variety of purposes.

“This new model is designed for offline text-to-speech — perfect for audiobooks, podcasts, video voiceovers, and video game characters — where creators need realistic, character-specific voices,” Cowen defined.

Nonetheless, the consumer should entry it by means of Hume’s web site both on its Initiatives web page or by means of an software programming interface (API). The “offline” part refers to the truth that this mannequin is designed to supply discrete audio information that may be added to initiatives akin to movies or audiobooks. It’s not designed to hold on real-time dialog, although that would theoretically be allowed by piping in textual content queries to the web site.

Hume’s API permits builders to make as much as 50 requests of the brand new Octave mannequin per minute, with a most textual content size of 5,000 characters and descriptions capped at 1,000 characters. Every request can generate as much as 5 outputs, and the supported audio codecs embody MP3, WAV and PCM.

Hume’s prior EVI collection of fashions permits for streaming, real-time, back-and-forth interactions. They continue to be accessible and can proceed to be developed.

Hume AI provides a subscription-based pricing mannequin with tiers starting from a free choice to Creator, Creator Professional, and Enterprise plans.

Right here’s a concise breakdown of the choices:

Free ($0/month) – 10,000 characters of text-to-speech monthly (~10 minutes) with limitless customized voices

Starter ($3/month) – 30,000 characters (~half-hour) plus help for as much as 20 initiatives

Creator ($10/month) – 100,000 characters (~100 minutes), usage-based pricing for additional characters ($0.20/1,000), and help for as much as 1,000 initiatives

Professional ($50/month) – 500,000 characters (~500 minutes), decrease usage-based pricing ($0.15/1,000), and help for as much as 3,000 initiatives

Scale ($150/month) – 2,000,000 characters (~2,000 minutes), additional diminished usage-based pricing ($0.13/1,000), and help for as much as 10,000 initiatives

Enterprise ($900/month) – 10,000,000 characters (~10,000 minutes), even decrease usage-based pricing ($0.10/1,000), and help for as much as 20,000 initiatives

Enterprise (Customized worth) – Limitless utilization, customized authorized phrases, safety assurances, considerably discounted bulk pricing, and precedence help

Altogether, Hume emphasised that its Octave TTS pricing is round half the price of the competing service from AI voice creation startup ElevenLabs, exhibiting the intensifying competitors within the text-to-speech house.

As well as, Hume AI performed a blind comparability research with 180 human raters to benchmark Octave towards ElevenLabs. The outcomes confirmed that Octave was most popular by way of audio high quality (71.6% of trials), naturalness (51.7% of trials), and the way properly the speech matched descriptions of the specified voice (57.7% of trials), throughout 120 numerous prompts.

To additional consider its efficiency, Hume AI has additionally launched the Expressive TTS Area, a public benchmark designed to check how properly AI fashions deal with longer, expressive speech — an space that earlier TTS benchmarks have largely ignored.

Tens of trillions of language tokens

Not like conventional text-to-speech programs that depend on restricted speech datasets, Octave TTS is constructed on an LLM skilled on tens of trillions of language tokens.

“Traditional text-to-speech models are trained on limited speech data, but ours is built on an LLM trained on tens of trillions of tokens, enabling it to reason, think, and infer emotions from text,” Cowen stated.

The mannequin was skilled utilizing tens of millions of hours of public, long-form speech knowledge and Hume AI’s proprietary datasets of recent voices recored by survey contributors.

“We collected data from people recording themselves through webcams, reacting naturally to videos, telling stories, and talking to others, including friends and family, to capture a wide range of emotional expressions,” Cowen stated.

This in depth coaching permits the mannequin to deduce emotional context and comply with detailed directions, creating voices that match particular character descriptions and attributes.

Constant character voices and limitations

Octave TTS maintains constant character voices throughout long-form content material.

“With our platform, you can generate unique voices for each character in an audiobook — like a middle-aged orc — and maintain that character’s voice throughout the story,” Cowen stated.

This functionality is supported by Hume AI’s “Projects” web page, which handles long-form content material like audiobooks by routinely chunking textual content whereas preserving character consistency and context throughout chapters.

Hume has technical guardrails constructed into its web site and API prohibiting sure makes use of, however apart from that, it’s open to make use of throughout a variety of content material and topics, together with doubtlessly not-safe-for-work scenes akin to these in standard romance novels.

“We give developers freedom, allowing content across a broad range of human experiences, though we restrict the creation of realistic children’s voices and imitations of specific individuals,” Cowen defined.

As well as, Cowen stated that the corporate might regulate these guardrails for particular shoppers upon request, akin to a kids’s-book writer trying to create voices for kids’s audiobooks.

Hume AI is engaged on a forthcoming Voice Cloning characteristic, which is able to permit customers to duplicate a voice from as little as 5 seconds of audio. The corporate is growing safeguards to make sure moral use earlier than rolling out the characteristic publicly.

With its mixture of contextual consciousness, emotional expression and character customization, Octave TTS goals to supply content material creators with extra management and suppleness, delivering voices that sound each life like and emotionally participating.

Day by day insights on enterprise use instances with VB Day by day

If you wish to impress your boss, VB Day by day has you lined. We provide the inside scoop on what firms are doing with generative AI, from regulatory shifts to sensible deployments, so you possibly can share insights for max ROI.

An error occured.

TikTok dad or mum firm ByteDance releases new open supply Seed-OSS-36B mannequin with 512K token context

You Might Also Like

Don’t sleep on Cohere: Command A Reasoning, its first reasoning mannequin, is constructed for enterprise customer support and extra

MIT report misunderstood: Shadow AI financial system booms whereas headlines cry failure

Inside Walmart’s AI safety stack: How a startup mentality is hardening enterprise-scale protection 

Chan Zuckerberg Initiative’s rBio makes use of digital cells to coach AI, bypassing lab work

How AI ‘digital minds’ startup Delphi stopped drowning in consumer knowledge and scaled up with Pinecone

TAGGED:adjustablecustomEmotionsgeneratesHumelaunchesmodelOctavetexttospeechvoices
Share This Article
Facebook Twitter Email Print

Follow US

Find US on Social Medias
FacebookLike
TwitterFollow
YoutubeSubscribe
TelegramFollow
Popular News
Cambodian Temple, Once Overcrowded, Wants Tourists Back
World

Cambodian Temple, Once Overcrowded, Wants Tourists Back

Editorial Board March 14, 2022
Cowboys add WR George Pickens in commerce with Steelers to fill main want
Juneteenth Is the Story of a Freedom Withheld
Cameron Payne’s ‘ugly’ soar shot helps gas league-best Knicks offense: ‘At the end of the day, it goes in’
House Panel to Hold Public Hearing on Unexplained Aerial Sightings

You Might Also Like

TikTok dad or mum firm ByteDance releases new open supply Seed-OSS-36B mannequin with 512K token context
Technology

TikTok dad or mum firm ByteDance releases new open supply Seed-OSS-36B mannequin with 512K token context

August 21, 2025
TikTok dad or mum firm ByteDance releases new open supply Seed-OSS-36B mannequin with 512K token context
Technology

Enterprise Claude will get admin, compliance instruments—simply not limitless utilization

August 21, 2025
TikTok dad or mum firm ByteDance releases new open supply Seed-OSS-36B mannequin with 512K token context
Technology

CodeSignal’s new AI tutoring app Cosmo needs to be the ‘Duolingo for job skills’

August 20, 2025
Qwen-Picture Edit offers Photoshop a run for its cash with AI-powered text-to-image edits that work in seconds
Technology

Qwen-Picture Edit offers Photoshop a run for its cash with AI-powered text-to-image edits that work in seconds

August 20, 2025

Categories

  • Health
  • Sports
  • Politics
  • Entertainment
  • Technology
  • World
  • Art

About US

New York Dawn is a proud and integral publication of the Enspirers News Group, embodying the values of journalistic integrity and excellence.
Company
  • About Us
  • Newsroom Policies & Standards
  • Diversity & Inclusion
  • Careers
  • Media & Community Relations
  • Accessibility Statement
Contact Us
  • Contact Us
  • Contact Customer Care
  • Advertise
  • Licensing & Syndication
  • Request a Correction
  • Contact the Newsroom
  • Send a News Tip
  • Report a Vulnerability
Term of Use
  • Digital Products Terms of Sale
  • Terms of Service
  • Privacy Policy
  • Cookie Settings
  • Submissions & Discussion Policy
  • RSS Terms of Service
  • Ad Choices
© 2024 New York Dawn. All Rights Reserved.
Welcome Back!

Sign in to your account

Lost your password?