The first generation of AI-powered products (often called “AI Wrapper” apps, because they “just” are wrapped around an LLM API) were quickly brought to market by small teams of engineers, …
An LLM Query Understanding Service Doug Turnbull recently wrote about how all search is structured now: “Many times, even a small open source LLM will be able to turn a search query into reasonable structure at relatively low cost.”
Model Context Protocol has prompt injection security problems
As more people start hacking around with implementations of MCP (the Model Context Protocol, a new standard for making tools available to LLM-powered systems) the security implications of tools built on that protocol are starting to come into focus.
Reinforcement Learning with Neural Networks: Essential Concepts
Reinforcement Learning has helped train neural networks to win games, drive cars and even get ChatGPT to sound more human when it responds to your prompt. Th...
Ai Will Try to Cheat & Escape (aka Rob Miles was Right!) - Computerphile
As Large Language Models improve, the tokens they predict form ever more complicated and nuanced outcomes. Rob Miles and Ryan Greenblatt discuss "Alignment F...
Get the full code: https://github.com/riza-io/examples/tree/main/demos/mcp_and_pydanticai
Try out Riza's MCP server: https://docs.riza.io/getting-started/mcp-servers
PydanticAI is one of the most popular frameworks for building AI agents in Python, and it recently launched MCP support. If you've been wanting to learn MCP, this is a great place to start.
In this tutorial you'll build a simple agent with PydanticAI, and then add an MCP server called fetch, which enables web browsing. Then we'll use the Postgres MCP server to add database querying to our agent. Then we'll use Riza's remote MCP server to add a code interpreter to our agent.
Pydantic Evals Brand new package from the Pydantic AI team which directly tackles what I consider to be the single hardest problem in AI engineering: building evals to determine if your LLM-based system is working correctly and getting better over time.
Google's Gemma 3 model (the 27B variant is particularly capable, I've been trying it out [via Ollama](https://ollama.com/library/gemma3)) supports function calling exclusively through prompt engineering. The official documentation describes two recommended …
The Playwright team at Microsoft have released an MCP ([Model Context Protocol](https://github.com/microsoft/playwright-mcp)) server wrapping Playwright, and it's pretty fascinating. They implemented it on top of the Chrome accessibility tree, so …
"I want Llama3 to perform 10x with my private knowledge" - Local Agentic RAG w/ llama3
Advanced RAG 101 - build agentic RAG with llama3Get free HubSpot report of how AI is redefining startup GTM strategy: https://clickhubspot.com/4hx🔗 Links- J...
Shortform link:
https://shortform.com/artem
In this video we will talk about backpropagation – an algorithm powering the entire field of machine learning and try to derive it from first principles.
OUTLINE:
00:00 Introduction
01:28 Historical background
02:50 Curve Fitting problem
06:26 Random vs guided adjustments
09:43 Derivatives
14:34 Gradient Descent
16:23 Higher dimensions
21:36 Chain Rule Intuition
27:01 Computational Graph and Autodiff
36:24 Summary
38:16 Shortform
39:20 Outro
USEFUL RESOURCES:
Andrej Karpathy's playlist: https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&si=zBUZW5kufVPLVy9E
Jürgen Schmidhuber's blog on the history of backprop:
https://people.idsia.ch/~juergen/who-invented-backpropagation.html
CREDITS:
Icons by https://www.freepik.com/
A CLI that helps you easily install and manage Model Context Protocol Servers. Simple package management with comprehensive analytics and GitHub integration.
Claude MCP has Changed AI Forever - Here's What You NEED to Know
Everyone is starting to realize how big of a deal Claude’s Model Context Protocol (MCP) is - it’s the first ever “standard” for connecting LLMs with services like your database, Slack, GitHub, web search, etc. It’s VERY powerful and not well understood by many, so in this video I break down everything you need to know about MCP at a high level.
I go quick here unlike my usual videos, but I call out a bunch of different resources you can use to dive into anything deeper that you’re curious about - MCP architecture, building your own MCP server, integrating your custom AI agent with MCP, etc.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Check out Stagehand, an incredible tool to crawl and scrape websites with natural language which I used in this video:
https://github.com/browserbase/stagehand
And here is the Stagehand MCP server that I showcased (you will need a Browserbase API key which is free to start!):
https://github.com/browserbase/mcp-server-browserbase/blob/main/stagehand/README.md
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Documentation for Claude’s MCP:
https://modelcontextprotocol.io/introduction
List of MCP Servers on GitHub:
https://github.com/modelcontextprotocol/servers
Example n8n MCP Agent:
https://github.com/coleam00/ottomator-agents/tree/main/n8n-mcp-agent
n8n Community Node for MCP:
https://github.com/nerding-io/n8n-nodes-mcp
Example Pydantic AI MCP Agent:
https://github.com/coleam00/ottomator-agents/tree/main/pydantic-ai-mcp-agent
Dive deep into the architecture of MCP:
https://modelcontextprotocol.io/docs/concepts/architecture
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
00:00 - MCP is Blowing Up
01:55 - What is MCP?
03:12 - Making MCP "Click" (Deep Dive with Diagrams)
05:33 - How Agents Work with MCP
07:16 - Word of Caution - What MCP Isn't
08:17 - Where You Can Use MCP
09:47 - MCP Servers You Can Use NOW
11:18 - How to Set Up MCP Servers
12:08 - Using MCP Servers in Claude Desktop
13:11 - MCP Demo in Claude Desktop (Brave + Stagehand)
14:09 - Building with MCP (Servers and Clients)
15:22 - Building Your Own MCP Server
18:09 - MCP with n8n AI Agents
20:10 - MCP with Python AI Agents
21:56 - The Future of MCP
23:51 - Outro
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Join me as I push the limits of what is possible with AI. I'll be uploading videos at least two times a week - Sundays and Wednesdays at 7:00 PM CDT!
Nicholas Carlini, previously deeply skeptical about the utility of LLMs, discusses at length his thoughts on where the technology might go. He presents compelling, detailed arguments for both ends of …
Thanks to KiwiCo for sponsoring today’s video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off your first monthly club crate or for 20% off your first Panda Crate!
MLA/DeepSeek Poster at 17:12 (Free shipping for a limited time with code DEEPSEEK):
https://www.welchlabs.com/resources/mladeepseek-attention-poster-13x19
Limited edition MLA Poster and Signed Book:
https://www.welchlabs.com/resources/deepseek-bundle-mla-poster-and-signed-book-limited-run
Imaginary Numbers book is back in stock!
https://www.welchlabs.com/resources/imaginary-numbers-book
Special Thanks to Patrons https://www.patreon.com/c/welchlabs
Juan Benet, Ross Hanson, Yan Babitski, AJ Englehardt, Alvin Khaled, Eduardo Barraza, Hitoshi Yamauchi, Jaewon Jung, Mrgoodlight, Shinichi Hayashi, Sid Sarasvati, Dominic Beaumont, Shannon Prater, Ubiquity Ventures, Matias Forti, Brian Henry, Tim Palade, Petar Vecutin, Nicolas baumann, Jason Singh, Robert Riley, vornska, Barry Silverman, Jake Ehrlich
References
DeepSeek-V2 paper: https://arxiv.org/pdf/2405.04434
DeepSeek-R1 paper: https://arxiv.org/abs/2501.12948
Great Article by Ege Erdil: https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
GPT-2 Visualizaiton: https://github.com/TransformerLensOrg/TransformerLens
Manim Animations: https://github.com/stephencwelch/manim_videos
Technical Notes
1. Note that DeepSeek-V2 paper claims a KV cache size reduction of 93.3%. They don’t exactly publish their methodology, but as far as I can tell it’s something likes this: start with Deepseek-v2 hyperparameters here: https://huggingface.co/deepseek-ai/DeepSeek-V2/blob/main/configuration_deepseek.py. num_hidden_layers=30, num_attention_heads=32, v_head_dim = 128. If DeepSeek-v2 was implemented with traditional MHA, then KV cache size would be 2*32*128*30*2=491,520 B/token. With MLA with a KV cache size of 576, we get a total cache size of 576*30=34,560 B/token. The percent reduction in KV cache size is then equal to (491,520-34,560)/492,520=92.8%. The numbers I present in this video follow the same approach but are for DeepSeek-v3/R1 architecture: https://huggingface.co/deepseek-ai/DeepSeek-V3/blob/main/config.json. num_hidden_layers=61, num_attention_heads=128, v_head_dim = 128. So traditional MHA cache would be 2*128*128*61*2 = 3,997,696 B/token. MLA reduces this to 576*61*2=70,272 B/token. Tor the DeepSeek-V3/R1 architecture, MLA reduces the KV cache size by a factor of 3,997,696/70,272 =56.9X.
2. I claim a couple times that MLA allows DeepSeek to generate tokens more than 6x faster than a vanilla transformer. The DeepSeek-V2 paper claims a slightly less than 6x throughput improvement with MLA, but since the V3/R1 architecture is heavier, we expect a larger lift, which is why i claim “more than 6x faster than a vanilla transformer” - in reality it’s probably significantly more than 6x for the V3/R1 architecture.
3. In all attention patterns and walkthroughs, we’re ignoring the |beginning of sentence| token. “The American flag is red, white, and” actually maps to 10 tokens if we include this starting token, and may attention patterns do assign high values to this token.
4. We’re ignoring bias terms matrix equations.
5. We’re ignoring positional embeddings. These are fascinating. See DeepSeek papers and ROPE.
In this video I look at SmolDocling and how it compares to the other OCR solutions that are out there, both open and proprietary. Blog: https://huggingface.c...
How to Build an In-N-Out Agent with OpenAI Agents SDK
In this video, I take a deeper dive look at the OpenAI Agents SDK and how it can be used to build a fast food agent.
Colab: https://dripl.ink/MZw2R
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps:
00:00 Intro
00:11 Creating an In-N-Out Agent (Colab Demo)
00:40 In-N-Out Burger Agent
04:35 Streaming runs
05:40 Adding Tools
08:20 Websearch Tool
09:45 Agents as Tools
12:21 Giving it a Chat Memory
Gemma 3 represents Google’s approach to accessible AI, bridging the gap between cutting-edge research and practical application. While the Gemini family represents Google’s flagship, closed, and most powerful models, Gemma offers a lightweight, “open” counterpart designed for wider use and customization. Specifically, Gemma 3’s model weights are openly released, allowing developers to download, deploy, andContinue reading "Gemma 3: What You Need To Know"
Gemma 3 - The NEW Gemma Family Members Have Arrived!!!
In this video, I look at the release of the new Gemma 3 models, which come in four different flavors: a 1B, a 4B, a 12B, and the new Big 27B parameter model.
Demo: https://huggingface.co/spaces/huggingface-projects/gemma-3-12b-it
Blog: https://blog.google/technology/developers/gemma-3/?linkId=sam_witteveen
Model Weights: https://huggingface.co/collections/google/gemma-3-release-67c6c6f89c4f76621268bb6d
For more tutorials on using LLMs and building agents, check out my Patreon
Patreon: https://www.patreon.com/SamWitteveen
Twitter: https://x.com/Sam_Witteveen
🕵️ Interested in building LLM Agents? Fill out the form below
Building LLM Agents Form: https://drp.li/dIMes
👨💻Github:
https://github.com/samwit/llm-tutorials
⏱️Time Stamps: