Found 13 bookmarks
Custom sorting
How DeepSeek Rewrote the Transformer [MLA]
How DeepSeek Rewrote the Transformer [MLA]
Thanks to KiwiCo for sponsoring today’s video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first monthly club crate or for 20% off your first Panda Crate! MLA/DeepSeek Poster at 17:12 (Free shipping for a limited time with code DEEPSEEK): https://www.welchlabs.com/resources/mladeepseek-attention-poster-13x19 Limited edition MLA Poster and Signed Book: https://www.welchlabs.com/resources/deepseek-bundle-mla-poster-and-signed-book-limited-run Imaginary Numbers book is back in stock! https://www.welchlabs.com/resources/imaginary-numbers-book Special Thanks to Patrons https://www.patreon.com/c/welchlabs Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich References DeepSeek-V2 paper: https://arxiv.org/pdf/2405.04434 DeepSeek-R1 paper: https://arxiv.org/abs/2501.12948 Great Article by Ege Erdil: https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture GPT-2 Visualizaiton: https://github.com/TransformerLensOrg/TransformerLens Manim Animations: https://github.com/stephencwelch/manim_videos Technical Notes 1. Note that DeepSeek-V2 paper claims a KV cache size reduction of 93.3%. They don’t exactly publish their methodology, but as far as I can tell it’s something likes this: start with Deepseek-v2 hyperparameters here: https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/configuration_deepseek.py. num_hidden_layers=30, num_attention_heads=32, v_head_dim = 128. If DeepSeek-v2 was implemented with traditional MHA, then KV cache size would be 2*32*128*30*2=491,520 B/token. With MLA with a KV cache size of 576, we get a total cache size of 576*30=34,560 B/token. The percent reduction in KV cache size is then equal to (491,520-34,560)/492,520=92.8%. The numbers I present in this video follow the same approach but are for DeepSeek-v3/R1 architecture: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json. num_hidden_layers=61, num_attention_heads=128, v_head_dim = 128. So traditional MHA cache would be 2*128*128*61*2 = 3,997,696 B/token. MLA reduces this to 576*61*2=70,272 B/token. Tor the DeepSeek-V3/R1 architecture, MLA reduces the KV cache size by a factor of 3,997,696/70,272 =56.9X. 2. I claim a couple times that MLA allows DeepSeek to generate tokens more than 6x faster than a vanilla transformer. The DeepSeek-V2 paper claims a slightly less than 6x throughput improvement with MLA, but since the V3/R1 architecture is heavier, we expect a larger lift, which is why i claim “more than 6x faster than a vanilla transformer” - in reality it’s probably significantly more than 6x for the V3/R1 architecture. 3. In all attention patterns and walkthroughs, we’re ignoring the |beginning of sentence| token. “The American flag is red, white, and” actually maps to 10 tokens if we include this starting token, and may attention patterns do assign high values to this token. 4. We’re ignoring bias terms matrix equations. 5. We’re ignoring positional embeddings. These are fascinating. See DeepSeek papers and ROPE.
·youtube.com·
How DeepSeek Rewrote the Transformer [MLA]
How to Build an In-N-Out Agent with OpenAI Agents SDK
How to Build an In-N-Out Agent with OpenAI Agents SDK
In this video, I take a deeper dive look at the OpenAI Agents SDK and how it can be used to build a fast food agent. Colab: https://dripl.ink/MZw2R For more tutorials on using LLMs and building agents, check out my Patreon Patreon: https://www.patreon.com/SamWitteveen Twitter: https://x.com/Sam_Witteveen 🕵️ Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes 👨‍💻Github: https://github.com/samwit/llm-tutorials ⏱️Time Stamps: 00:00 Intro 00:11 Creating an In-N-Out Agent (Colab Demo) 00:40 In-N-Out Burger Agent 04:35 Streaming runs 05:40 Adding Tools 08:20 Websearch Tool 09:45 Agents as Tools 12:21 Giving it a Chat Memory
·youtube.com·
How to Build an In-N-Out Agent with OpenAI Agents SDK
How to connect a LLM to Zotero for a private, local research assistant – fast, no code
How to connect a LLM to Zotero for a private, local research assistant – fast, no code
Learn how to use chat with your Zotero database using a private, local LLM with no coding required, Llama, Deepseek, any LLM you want! Please like and subscribe to help support the channel. @LearnMetaAnalysis Ollama - https://ollama.com/ Docker - https://www.docker.com/ Open WebUI Quickstart - https://docs.openwebui.com/getting-started/quick-start Zotero - https://www.zotero.org/ Zotero Directory Information - https://www.zotero.org/support/zotero_data Tutorials and how-to guides: Getting started with Open WebUI: https://youtu.be/gm_1VUg3L24 Conventional meta-analysis: https://www.youtube.com/playlist?list=PLXa5cTEormkEbYpBIgikgE0y9QR7QIgzs Three-level meta-analysis: https://www.youtube.com/playlist?list=PLXa5cTEormkHwRmu_TJXa7fSb6-WBXXoJ Three-level meta-analysis with correlated and hierarchical effects and robust variance estimation: https://www.youtube.com/playlist?list=PLXa5cTEormkEGenfcnp9X5dQUhmm7f9Jp Want free point and click (no coding required) meta-analysis software? Check out Simple Meta-Analysis: https://learnmeta-analysis.com/pages/simple-meta-analysis-software Tired of manually extracting data for systematic review and meta-analysis? Check out AI-Assisted Data Extraction, a free package for R! https://youtu.be/HuWXbe7hgFc Free ebook on meta-analysis in R (no download required): https://noah-schroeder.github.io/reviewbook/ Visit our website at https://learnmeta-analysis.com/ 0:00 What we’re building 1:40 Requirements 7:05 Sync Zotero database 10:13 Custom model 12:13 It works! 17:26 Changing LLM 18:54 Updating knowledge database
·youtube.com·
How to connect a LLM to Zotero for a private, local research assistant – fast, no code
You HAVE to Try Agentic RAG with DeepSeek R1 (Insane Results)
You HAVE to Try Agentic RAG with DeepSeek R1 (Insane Results)
Deepseek R1 - the latest and greatest open source reasoning LLM - has taken the world by storm and a lot of content creators are doing a great job covering its implications and strengths/weaknesses. What I haven’t seen a lot of though is actually using R1 in agentic workflows to truly leverage its power. So that’s what I’m showing you in this video - we’ll be using the power of R1 to make a simple but super effective agentic RAG setup. We’ll be using Smolagents by HuggingFace to create our agent - it’s the simplest agent framework out there and many of you have been asking me to try it out. This agentic RAG setup centers around the idea that reasoning LLMs like R1 are extremely powerful but quite slow. Because of this, a lot of people are starting to experiment with combining the raw power of a model like R1 with a more lightweight and fast LLM to drive the primary conversation/agent flow. Think of basically giving R1 as a tool for an agent to use when it needs more reasoning power at the cost of a slower response (and higher costs). That’s what we’ll be doing here - creating an agent that has an R1 driven RAG tool to extract in depth insights from a knowledgebase. The example in this video is meant to be an introduction to these kind of reasoning agentic flows. That’s why I keep it simple with Smolagents and a local knowledgebase. But I’m planning on expanding this much further soon with a much more robust but still similar flow built with Pydantic AI and LangGraph! ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The Community Voting period of the oTTomator Hackathon is open! Head on over to the Live Agent Studio now and test out the submissions and vote for your favorite agents. There are so many incredible projects to try out! https://studio.ottomator.ai All the code covered in this video + instructions to run it can be found here: https://github.com/coleam00/ottomator-agents/tree/main/r1-distill-rag SmolAgents: https://huggingface.co/docs/smolagents/en/index R1 on Ollama: https://ollama.com/library/deepseek-r1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 00:00 - Why R1 for Agentic RAG? 01:56 - Overview of our Agent 03:33 - SmolAgents - Our Ticket to Fast Agents 06:07 - Building our Agentic RAG Agent with R1 14:17 - Creating our Local Knowledgebase w/ Chroma DB 15:45 - Getting our Local LLMs Set Up with Ollama 19:15 - R1 Agentic RAG Demo 21:42 - Outro ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Join me as I push the limits of what is possible with AI. I'll be uploading videos at least two times a week - Sundays and Wednesdays at 7:00 PM CDT!
Deep Dive into LLMs like ChatGPT
·youtube.com·
You HAVE to Try Agentic RAG with DeepSeek R1 (Insane Results)
I built a DeepSeek R1 powered VS Code extension…
I built a DeepSeek R1 powered VS Code extension…
Learn how to build a VS Code Extension from scratch. In this fun tutorial, we integrate DeepSeek R1 direction into our editor to build a custom AI assistant. Go Deeper https://fireship.io/courses Related Content: VS Code Extension Template https://code.visualstudio.com/api/get-started/your-first-extension Ollama DeepSeek R1 https://ollama.com/library/deepseek-r1 DeepSeek R1 First Look https://youtu.be/-2k1rcRzsLA DeepSeek Fallout https://youtu.be/Nl7aCUsWykg
·youtube.com·
I built a DeepSeek R1 powered VS Code extension…
LangGraph Crash Course with code examples
LangGraph Crash Course with code examples
Colab 01. Learning LangGraph Agent Executor: https://drp.li/vL1J9 Colab 02. Learning LangGraph - Chat Executor: https://drp.li/HAz3o Colab 03. Learning LangGraph - Agent Supervisor: https://drp.li/xvEwd Interested in building LLM Agents? Fill out the form below Building LLM Agents Form: https://drp.li/dIMes Github: https://github.com/samwit/langchain-tutorials (updated) https://github.com/samwit/llm-tutorials Time Stamps: 00:00 Intro 00:19 What is LangGraph? 00:26 LangGraph Blog 01:38 StateGraph 02:16 Nodes 02:42 Edges 03:48 Compiling the Graph 05:23 Code Time 05:34 Agent with new create_open_ai 21:37 Chat Executor 27:00 Agent Supervisor
·youtube.com·
LangGraph Crash Course with code examples
The Illustrated Transformer
The Illustrated Transformer
Discussions: Hacker News (65 points, 4 comments), Reddit r/MachineLearning (29 points, 3 comments) Translations: Arabic, Chinese (Simplified) 1, Chinese (Simplified) 2, French 1, French 2, Italian, Japanese, Korean, Persian, Russian, Spanish 1, Spanish 2, Vietnamese Watch: MIT’s Deep Learning State of the Art lecture referencing this post Featured in courses at Stanford, Harvard, MIT, Princeton, CMU and others In the previous post, we looked at Attention – a ubiquitous method in modern deep learning models. Attention is a concept that helped improve the performance of neural machine translation applications. In this post, we will look at The Transformer – a model that uses attention to boost the speed with which these models can be trained. The Transformer outperforms the Google Neural Machine Translation model in specific tasks. The biggest benefit, however, comes from how The Transformer lends itself to parallelization. It is in fact Google Cloud’s recommendation to use The Transformer as a reference model to use their Cloud TPU offering. So let’s try to break the model apart and look at how it functions. The Transformer was proposed in the paper Attention is All You Need. A TensorFlow implementation of it is available as a part of the Tensor2Tensor package. Harvard’s NLP group created a guide annotating the paper with PyTorch implementation. In this post, we will attempt to oversimplify things a bit and introduce the concepts one by one to hopefully make it easier to understand to people without in-depth knowledge of the subject matter. 2020 Update: I’ve created a “Narrated Transformer” video which is a gentler approach to the topic: A High-Level Look Let’s begin by looking at the model as a single black box. In a machine translation application, it would take a sentence in one language, and output its translation in another.
·jalammar.github.io·
The Illustrated Transformer