Chatterbox-TTS Apple Silicon - a Hugging Face Space by Jimmi42
Upload a reference audio file and enter text to create audio in that voice. The app automatically chunks long text and uses Apple Silicon's GPU for faster processing.
Building with Chatterbox TTS, Voice Cloning & Watermarking
In this video, I look at the new Chatterbox TTS from Resemble.AI and how it's improving open-source text-to-speech with its impressive voice cloning and emotion control capabilities. We explore its features, including zero-shot voice cloning that requires only a few seconds of audio, and its unique ability to adjust the emotional intensity of speech.
Colab: https://dripl.ink/Vxs8D
Blog: https://www.resemble.ai/chatterbox/
Hugging Face Spaces: https://huggingface.co/spaces/ResembleAI/Chatterbox
Hugging Face: https://huggingface.co/ResembleAI/chatterbox
GitHub: Chatterbox-TTS-Extended https://github.com/petermg/Chatterbox-TTS-Extended
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:24 Resemble.AI - Chatterbox
01:53 Samples
04:53 Hugging Face: Chatterbox
05:22 Demo
06:26 Adding Exaggeration
08:56 Voice Cloning
13:00 Chatterbox TTS Extended Github
14:07 Hugging Face: Chatterbox GGUF
THIS is why large language models can understand the world
5 years ago, nobody would have guessed that scaling up LLMs would as successful as they are. This belief, in part, was due to the fact that all known statistical learning theory predicted that massively oversized models should overfit, and hence perform worse than smaller models. Yet the undeniable fact is that modern LLMs do possess models of the world that allow them to generalize beyond their training data.
Why do larger models generalize better than smaller models? Why does training a model to predict internet text cause it to develop world models? Come deep dive into the inner working of neural network training to understand why scaling LLMs works so damn well.
Want to see more videos like this in the future? Support me on Ko-fi https://ko-fi.com/algorithmicsimplicity
Papers referenced:
Double Descent: https://arxiv.org/abs/1812.11118
The Lottery Ticket Hypothesis: https://arxiv.org/abs/1803.03635
My previous videos on Autoregressive Transformers:
Auto-regression (and diffusion): https://youtu.be/zc5NTeJbk-k
Transformers: https://youtu.be/kWLed8o5M2Y
Exclusive: Anthropic hits $3 billion in annualized revenue on business demand for AI
Artificial intelligence developer Anthropic is making about $3 billion in annualized revenue, according to two sources familiar with the matter, in an early validation of generative AI use in the business world.
Thomas Ptacek's frustrated tone throughout this piece perfectly captures how it feels sometimes to be an experienced programmer trying to argue that "LLMs are actually really useful" in many corners …
Agentic Document Extraction: 17x Faster, Smarter, with LLM-Ready Outputs
Agentic Document Extraction just got faster! We've improved the median document processing from 135 seconds to 8 seconds!
Agentic Document Extraction sees documents visually and uses an iterative workflow to accurately extract text, figures, form fields, charts, and more to create an LLM-ready output.
You can use our SDK to parse complex documents and get the extracted content in Markdown and JSON. You can then feed the output to an LLM, RAG application, or other downstream apps.
You can also use our Playground to test out Agentic Document Extraction.
Try out Agentic Document Extraction:
- Playground: https://va.landing.ai/demo/doc-extraction
- Library: https://github.com/landing-ai/agentic-doc
Learn more: https://landing.ai/agentic-document-extraction
robertjakob/rigorous: A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient.
A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient. - robertjakob/rigorous
The ‘white-collar bloodbath’ is all part of the AI hype machine | CNN Business
If the CEO of a soda company declared that soda-making technology is getting so good it’s going to ruin the global economy, you’d be forgiven for thinking that person is either lying or fully detached from reality.
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Optical Character Recognition (OCR) is a powerful tool for extracting text from images, and with the rise of multimodal AI models, it's now easier than ever to implement locally. In this guide, we'll show you how to build a professional OCR application using Llama 3.2-Vision, Ollama for the backend, and Streamlit for the front end.PrerequisitesBefore we start, ensure you have the following:1. Python 3.10 or higher installed.2. Anaconda (Optional)3. Ollama installed for local model hosting. Downl
GraphRAG Explained: AI Retrieval with Knowledge Graphs & Cypher
Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/BdngMV
🚀 Try GraphRAG now! Access the code here → https://ibm.biz/BdngaC
Learn more about GraphRAG here → https://ibm.biz/BdngM9
🤖 Can AI turn text into structured knowledge? Discover how GraphRAG leverages knowledge graphs, graph databases, and Cypher queries to transform unstructured data into actionable insights. See how LLMs enable intelligent retrieval and automation, reshaping workflows across industries. 🚀
AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/BdngMU
#knowledgegraph #cypher #ai
A compelling use case for AI: a Japanese to English translator that gives me a translation, breakdown of the Chinese characters in a Japanese phrase, and the ability to ask follow-up questions.
Chatterbox TTS - a Hugging Face Space by ResembleAI
This app creates speech audio from written text by imitating the style of a reference audio sample. You provide text and an optional audio file, and it generates high-quality speech that matches th...