Extremely consequential new open weights model release from Google today: Multimodal by design: Gemma 3n natively supports image, audio, video, and text inputs and text outputs. Optimized for on-device: Engineered …
Building with Chatterbox TTS, Voice Cloning & Watermarking
In this video, I look at the new Chatterbox TTS from Resemble.AI and how it's improving open-source text-to-speech with its impressive voice cloning and emotion control capabilities. We explore its features, including zero-shot voice cloning that requires only a few seconds of audio, and its unique ability to adjust the emotional intensity of speech.
Colab: https://dripl.ink/Vxs8D
Blog: https://www.resemble.ai/chatterbox/
Hugging Face Spaces: https://huggingface.co/spaces/ResembleAI/Chatterbox
Hugging Face: https://huggingface.co/ResembleAI/chatterbox
GitHub: Chatterbox-TTS-Extended https://github.com/petermg/Chatterbox-TTS-Extended
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:24 Resemble.AI - Chatterbox
01:53 Samples
04:53 Hugging Face: Chatterbox
05:22 Demo
06:26 Adding Exaggeration
08:56 Voice Cloning
13:00 Chatterbox TTS Extended Github
14:07 Hugging Face: Chatterbox GGUF
NotebookLM’s automatically generated podcasts are surprisingly effective
Audio Overview is a fun new feature of Google’s NotebookLM which is getting a lot of attention right now. It generates a one-off custom podcast against content you provide, where …
GitHub - ictnlp/LLaMA-Omni: LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level. - ictnlp/LLaMA-Omni
AI-generated Tabs,chords,lyrics,melodies. Edit,transpose,separate tracks easily.Explore over 40M songs.Also includes interactive learning, turns any music or song(YouTube, Deezer, SoundCloud, MP3) into chords.Play along with guitar, ukulele, or piano.
GitHub - collabora/WhisperFusion: WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI. - GitHub - collabora/WhisperFusion: WhisperFusion builds upon the capabil...