We're #hiring a new Member of Technical Staff, Modeling in Santa Clara, California. Apply today or share this post with your network.
Boson AI
Research Services
Santa Clara, CA 3,281 followers
Making communication with AI as easy, natural and fun as talking to a human
About us
We are transforming how stories are told, knowledge is learned, and insights are gathered.
- Website
-
https://xmrrwallet.com/cmx.pboson.ai/
External link for Boson AI
- Industry
- Research Services
- Company size
- 11-50 employees
- Headquarters
- Santa Clara, CA
- Type
- Privately Held
- Founded
- 2023
- Specialties
- Artificial Intelligence and Machine Learning
Locations
-
Primary
Santa Clara, CA 95054, US
-
Toronto, CA
Employees at Boson AI
Updates
-
We're #hiring a new Senior Software Engineer - Agentic Systems and Platform in Santa Clara, California. Apply today or share this post with your network.
-
We're #hiring a new Member of Technical Staff, Modeling in Toronto, Ontario. Apply today or share this post with your network.
-
We're #hiring a new Member of Technical Staff, Evaluation in Santa Clara, California. Apply today or share this post with your network.
-
We're #hiring a new Senior Full Stack Engineer - Agentic Systems and Platform in Santa Clara, California. Apply today or share this post with your network.
-
We're #hiring a new Machine Learning Engineer - Enterprise in Toronto, Ontario. Apply today or share this post with your network.
-
We're #hiring a new Senior Software Engineer - Ceph in Toronto, Ontario. Apply today or share this post with your network.
-
We're #hiring a new Deep Learning Scientist in Santa Clara, California. Apply today or share this post with your network.
-
Kudos to my team at Boson AI for delivering our first audio model. Mu Li, Xingjian Shi, Yizhi Liu, Shuai Zheng, Ruskin Raj Manku, Dongming Shen, Yi Zhu, Silin Meng, Ke Bai, Yuyang (Rand) Xie, Jielin Qiu, Sergii Tiugaiev, Jaewon Lee, Alex Tay, Martin Ma, Zhangcheng (Zach) Zheng
At Boson AI, we work on making communication with AI as easy, natural and fun as talking to a human. Today, we are excited to introduce Higgs Audio Understanding and Higgs Audio Generation — two powerful tools designed to build customized AI agents tailored for diverse audio understanding and generation needs. Higgs Audio Generation: Realistic, Emotionally Intelligent Speech Traditional text-to-speech (TTS) systems may sound robotic, miss emotional nuance, and struggle with names, accents, or multiple voices. Higgs Audio Generation changes the game by offering emotionally rich speech and realistic multi-voice conversations. Our model understands the implied tone, urgency, hesitation and nuance in the text and renders it in a way a real human would. It pronounces foreign names and places correctly and with the correct accent. This makes our model ideal for games, audiobooks or screenplays. Higgs Audio Generation can accomplish this through the backing Large Language Model that ensures that it doesn't just speak words but understands them within context. Our model is trained on massive text-audio datasets for stunning realism. But don't just take our word for it - we have the benchmarks to prove it with Higgs Audio Generation beating openAI chat, Gemini and ElevenLabs in comparisons. Or try it out on our site. Complementing the suite is Higgs Audio Understanding, a model that can understand voice and other audio inputs. This makes it ideally suited for tasks such as transcription (speech recognition), including for meetings with multiple and sometimes slightly overlapping speakers. It also allows us to offer a model that can directly answer questions regarding the received audio, i.e. to perform Audio Understanding, without the need to hand the signal off to a separate dedicated Language Model. As a result, it is capable of reasoning about sounds (how many times did I clap my hands, where was the recording made, etc.) and music (what's the chord) at a high level of accuracy. Check out our magic broom shop demo to see how voice generation and audio understanding can work in harmony, e.g. for a retail application. Just like Audio Generation, our model is trained on massive text-audio datasets and it uses an underlying LLM to allow it to understand rather than just transcribe speech. In particular, this allows us to see benefits from Chain of Thought reasoning for complex understanding tasks. For more information, see https://xmrrwallet.com/cmx.plnkd.in/gYp_uBRk
-
At Boson AI, we work on making communication with AI as easy, natural and fun as talking to a human. Today, we are excited to introduce Higgs Audio Understanding and Higgs Audio Generation — two powerful tools designed to build customized AI agents tailored for diverse audio understanding and generation needs. Higgs Audio Generation: Realistic, Emotionally Intelligent Speech Traditional text-to-speech (TTS) systems may sound robotic, miss emotional nuance, and struggle with names, accents, or multiple voices. Higgs Audio Generation changes the game by offering emotionally rich speech and realistic multi-voice conversations. Our model understands the implied tone, urgency, hesitation and nuance in the text and renders it in a way a real human would. It pronounces foreign names and places correctly and with the correct accent. This makes our model ideal for games, audiobooks or screenplays. Higgs Audio Generation can accomplish this through the backing Large Language Model that ensures that it doesn't just speak words but understands them within context. Our model is trained on massive text-audio datasets for stunning realism. But don't just take our word for it - we have the benchmarks to prove it with Higgs Audio Generation beating openAI chat, Gemini and ElevenLabs in comparisons. Or try it out on our site. Complementing the suite is Higgs Audio Understanding, a model that can understand voice and other audio inputs. This makes it ideally suited for tasks such as transcription (speech recognition), including for meetings with multiple and sometimes slightly overlapping speakers. It also allows us to offer a model that can directly answer questions regarding the received audio, i.e. to perform Audio Understanding, without the need to hand the signal off to a separate dedicated Language Model. As a result, it is capable of reasoning about sounds (how many times did I clap my hands, where was the recording made, etc.) and music (what's the chord) at a high level of accuracy. Check out our magic broom shop demo to see how voice generation and audio understanding can work in harmony, e.g. for a retail application. Just like Audio Generation, our model is trained on massive text-audio datasets and it uses an underlying LLM to allow it to understand rather than just transcribe speech. In particular, this allows us to see benefits from Chain of Thought reasoning for complex understanding tasks. For more information, see https://xmrrwallet.com/cmx.plnkd.in/gYp_uBRk