Boson AI’s cover photo
Boson AI

Boson AI

Research Services

Santa Clara, CA 3,281 followers

Making communication with AI as easy, natural and fun as talking to a human

About us

We are transforming how stories are told, knowledge is learned, and insights are gathered.

Website
https://xmrrwallet.com/cmx.pboson.ai/
Industry
Research Services
Company size
11-50 employees
Headquarters
Santa Clara, CA
Type
Privately Held
Founded
2023
Specialties
Artificial Intelligence and Machine Learning

Locations

Employees at Boson AI

Updates

  • Kudos to my team at Boson AI for delivering our first audio model. Mu Li, Xingjian Shi, Yizhi Liu, Shuai Zheng, Ruskin Raj Manku, Dongming Shen, Yi Zhu, Silin Meng, Ke Bai, Yuyang (Rand) Xie, Jielin Qiu, Sergii Tiugaiev, Jaewon Lee, Alex Tay, Martin Ma, Zhangcheng (Zach) Zheng

    View organization page for Boson AI

    3,281 followers

    At Boson AI, we work on making communication with AI as easy, natural and fun as talking to a human. Today, we are excited to introduce Higgs Audio Understanding and Higgs Audio Generation — two powerful tools designed to build customized AI agents tailored for diverse audio understanding and generation needs. Higgs Audio Generation: Realistic, Emotionally Intelligent Speech Traditional text-to-speech (TTS) systems may sound robotic, miss emotional nuance, and struggle with names, accents, or multiple voices. Higgs Audio Generation changes the game by offering emotionally rich speech and realistic multi-voice conversations. Our model understands the implied tone, urgency, hesitation and nuance in the text and renders it in a way a real human would. It pronounces foreign names and places correctly and with the correct accent. This makes our model ideal for games, audiobooks or screenplays. Higgs Audio Generation can accomplish this through the backing Large Language Model that ensures that it doesn't just speak words but understands them within context. Our model is trained on massive text-audio datasets for stunning realism. But don't just take our word for it - we have the benchmarks to prove it with Higgs Audio Generation beating openAI chat, Gemini and ElevenLabs in comparisons. Or try it out on our site. Complementing the suite is Higgs Audio Understanding, a model that can understand voice and other audio inputs. This makes it ideally suited for tasks such as transcription (speech recognition), including for meetings with multiple and sometimes slightly overlapping speakers. It also allows us to offer a model that can directly answer questions regarding the received audio, i.e. to perform Audio Understanding, without the need to hand the signal off to a separate dedicated Language Model. As a result, it is capable of reasoning about sounds (how many times did I clap my hands, where was the recording made, etc.) and music (what's the chord) at a high level of accuracy. Check out our magic broom shop demo to see how voice generation and audio understanding can work in harmony, e.g. for a retail application. Just like Audio Generation, our model is trained on massive text-audio datasets and it uses an underlying LLM to allow it to understand rather than just transcribe speech. In particular, this allows us to see benefits from Chain of Thought reasoning for complex understanding tasks. For more information, see https://xmrrwallet.com/cmx.plnkd.in/gYp_uBRk

  • At Boson AI, we work on making communication with AI as easy, natural and fun as talking to a human. Today, we are excited to introduce Higgs Audio Understanding and Higgs Audio Generation — two powerful tools designed to build customized AI agents tailored for diverse audio understanding and generation needs. Higgs Audio Generation: Realistic, Emotionally Intelligent Speech Traditional text-to-speech (TTS) systems may sound robotic, miss emotional nuance, and struggle with names, accents, or multiple voices. Higgs Audio Generation changes the game by offering emotionally rich speech and realistic multi-voice conversations. Our model understands the implied tone, urgency, hesitation and nuance in the text and renders it in a way a real human would. It pronounces foreign names and places correctly and with the correct accent. This makes our model ideal for games, audiobooks or screenplays. Higgs Audio Generation can accomplish this through the backing Large Language Model that ensures that it doesn't just speak words but understands them within context. Our model is trained on massive text-audio datasets for stunning realism. But don't just take our word for it - we have the benchmarks to prove it with Higgs Audio Generation beating openAI chat, Gemini and ElevenLabs in comparisons. Or try it out on our site. Complementing the suite is Higgs Audio Understanding, a model that can understand voice and other audio inputs. This makes it ideally suited for tasks such as transcription (speech recognition), including for meetings with multiple and sometimes slightly overlapping speakers. It also allows us to offer a model that can directly answer questions regarding the received audio, i.e. to perform Audio Understanding, without the need to hand the signal off to a separate dedicated Language Model. As a result, it is capable of reasoning about sounds (how many times did I clap my hands, where was the recording made, etc.) and music (what's the chord) at a high level of accuracy. Check out our magic broom shop demo to see how voice generation and audio understanding can work in harmony, e.g. for a retail application. Just like Audio Generation, our model is trained on massive text-audio datasets and it uses an underlying LLM to allow it to understand rather than just transcribe speech. In particular, this allows us to see benefits from Chain of Thought reasoning for complex understanding tasks. For more information, see https://xmrrwallet.com/cmx.plnkd.in/gYp_uBRk

Similar pages

Browse jobs