AI/ML

AI/ML

2308 bookmarks
Custom sorting
Building with Chatterbox TTS, Voice Cloning & Watermarking
Building with Chatterbox TTS, Voice Cloning & Watermarking
In this video, I look at the new Chatterbox TTS from Resemble.AI and how it's improving open-source text-to-speech with its impressive voice cloning and emotion control capabilities. We explore its features, including zero-shot voice cloning that requires only a few seconds of audio, and its unique ability to adjust the emotional intensity of speech. Colab: https://dripl.ink/Vxs8D Blog: https://www.resemble.ai/chatterbox/ Hugging Face Spaces: https://huggingface.co/spaces/ResembleAI/Chatterbox Hugging Face: https://huggingface.co/ResembleAI/chatterbox GitHub: Chatterbox-TTS-Extended https://github.com/petermg/Chatterbox-TTS-Extended For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:24 Resemble.AI - Chatterbox 01:53 Samples 04:53 Hugging Face: Chatterbox 05:22 Demo 06:26 Adding Exaggeration 08:56 Voice Cloning 13:00 Chatterbox TTS Extended Github 14:07 Hugging Face: Chatterbox GGUF
·youtube.com·
Building with Chatterbox TTS, Voice Cloning & Watermarking
THIS is why large language models can understand the world
THIS is why large language models can understand the world
5 years ago, nobody would have guessed that scaling up LLMs would as successful as they are. This belief, in part, was due to the fact that all known statistical learning theory predicted that massively oversized models should overfit, and hence perform worse than smaller models. Yet the undeniable fact is that modern LLMs do possess models of the world that allow them to generalize beyond their training data. Why do larger models generalize better than smaller models? Why does training a model to predict internet text cause it to develop world models? Come deep dive into the inner working of neural network training to understand why scaling LLMs works so damn well. Want to see more videos like this in the future? Support me on Ko-fi https://ko-fi.com/algorithmicsimplicity Papers referenced: Double Descent: https://arxiv.org/abs/1812.11118 The Lottery Ticket Hypothesis: https://arxiv.org/abs/1803.03635 My previous videos on Autoregressive Transformers: Auto-regression (and diffusion): https://youtu.be/zc5NTeJbk-k Transformers: https://youtu.be/kWLed8o5M2Y
·youtube.com·
THIS is why large language models can understand the world
My AI Skeptic Friends Are All Nuts
My AI Skeptic Friends Are All Nuts
Thomas Ptacek's frustrated tone throughout this piece perfectly captures how it feels sometimes to be an experienced programmer trying to argue that "LLMs are actually really useful" in many corners …
·simonwillison.net·
My AI Skeptic Friends Are All Nuts
Agentic Document Extraction: 17x Faster, Smarter, with LLM-Ready Outputs
Agentic Document Extraction: 17x Faster, Smarter, with LLM-Ready Outputs
Agentic Document Extraction just got faster! We've improved the median document processing from 135 seconds to 8 seconds! Agentic Document Extraction sees documents visually and uses an iterative workflow to accurately extract text, figures, form fields, charts, and more to create an LLM-ready output. You can use our SDK to parse complex documents and get the extracted content in Markdown and JSON. You can then feed the output to an LLM, RAG application, or other downstream apps. You can also use our Playground to test out Agentic Document Extraction. Try out Agentic Document Extraction: - Playground: https://va.landing.ai/demo/doc-extraction - Library: https://github.com/landing-ai/agentic-doc Learn more: https://landing.ai/agentic-document-extraction
·youtube.com·
Agentic Document Extraction: 17x Faster, Smarter, with LLM-Ready Outputs
robertjakob/rigorous: A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient.
robertjakob/rigorous: A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient.
A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient. - robertjakob/rigorous
·github.com·
robertjakob/rigorous: A comprehensive suite of tools, built to liberate science by making the creation, evaluation, and dissemination of research more transparent, affordable, and efficient.
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
Optical Character Recognition (OCR) is a powerful tool for extracting text from images, and with the rise of multimodal AI models, it's now easier than ever to implement locally. In this guide, we'll show you how to build a professional OCR application using Llama 3.2-Vision, Ollama for the backend, and Streamlit for the front end.PrerequisitesBefore we start, ensure you have the following:1. Python 3.10 or higher installed.2. Anaconda (Optional)3. Ollama installed for local model hosting. Downl
·gpt-labs.ai·
Private Local LlamaOCR with a User-Friendly Streamlit Front-End
GraphRAG Explained: AI Retrieval with Knowledge Graphs & Cypher
GraphRAG Explained: AI Retrieval with Knowledge Graphs & Cypher
Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam → https://ibm.biz/BdngMV 🚀 Try GraphRAG now! Access the code here → https://ibm.biz/BdngaC Learn more about GraphRAG here → https://ibm.biz/BdngM9 🤖 Can AI turn text into structured knowledge? Discover how GraphRAG leverages knowledge graphs, graph databases, and Cypher queries to transform unstructured data into actionable insights. See how LLMs enable intelligent retrieval and automation, reshaping workflows across industries. 🚀 AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM → https://ibm.biz/BdngMU #knowledgegraph #cypher #ai
·youtube.com·
GraphRAG Explained: AI Retrieval with Knowledge Graphs & Cypher
BOND | BOND
BOND | BOND
BOND is a global technology investment firm that supports visionary founders throughout their entire life cycle of innovation & growth.
·bondcap.com·
BOND | BOND
Toolmen
Toolmen
Even the best weapon is an unhappy tool.
·aworkinglibrary.com·
Toolmen
Raycast AI as Translator
Raycast AI as Translator
A compelling use case for AI: a Japanese to English translator that gives me a translation, breakdown of the Chinese characters in a Japanese phrase, and the ability to ask follow-up questions.
·scottwillsey.com·
Raycast AI as Translator
Chatterbox TTS - a Hugging Face Space by ResembleAI
Chatterbox TTS - a Hugging Face Space by ResembleAI
This app creates speech audio from written text by imitating the style of a reference audio sample. You provide text and an optional audio file, and it generates high-quality speech that matches th...
·huggingface.co·
Chatterbox TTS - a Hugging Face Space by ResembleAI