But how do AI videos actually work? | Guest video by @WelchLabsVideo
Diffusion models, CLIP, and the math of turning text into images
Welch Labs Book: https://www.welchlabs.com/resources/imaginary-numbers-book
Sections
0:00 - Intro
3:37 - CLIP
6:25 - Shared Embedding Space
8:16 - Diffusion Models & DDPM
11:44 - Learning Vector Fields
22:00 - DDIM
25:25 Dall E 2
26:37 - Conditioning
30:02 - Guidance
33:39 - Negative Prompts
34:27 - Outro
35:32 - About guest videos + Grant’s Reaction
Special Thanks to:
Jonathan Ho - Jonathan is the Author of the DDPM paper and the Classifier Free Guidance Paper.
https://arxiv.org/pdf/2006.11239
https://arxiv.org/pdf/2207.12598
Preetum Nakkiran - Preetum has an excellent introductory diffusion tutorial:
https://arxiv.org/pdf/2406.08929
Chenyang Yuan - Many of the animations in this video were implemented using manim and Chenyang’s smalldiffusion library: https://github.com/yuanchenyang/smalldiffusion
Cheyang also has a terrific tutorial and MIT course on diffusion models
https://www.chenyang.co/diffusion.html
https://www.practical-diffusion.org/
Other References
All of Sander Dieleman’s diffusion blog posts are fantastic: https://sander.ai/
CLIP Paper: https://arxiv.org/pdf/2103.00020
DDIM Paper: https://arxiv.org/pdf/2010.02502
Score-Based Generative Modeling: https://arxiv.org/pdf/2011.13456
Wan2.1: https://github.com/Wan-Video/Wan2.1
Stable Diffusion: https://huggingface.co/stabilityai/stable-diffusion-2
Midjourney: https://www.midjourney.com/
Veo: https://deepmind.google/models/veo/
DallE 2 paper: https://cdn.openai.com/papers/dall-e-2.pdf
Code for this video: https://github.com/stephencwelch/manim_videos/tree/master/_2025/sora
Written by: Stephen Welch, with very helpful feedback from Grant Sanderson
Produced by: Stephen Welch, Sam Baskin, and Pranav Gundu
Technical Notes
The noise videos in the opening have been passed through a VAE (actually, diffusion process happens in a compressed “latent” space), which acts very much like a video compressor - this is why the noise videos don’t look like pure salt and pepper.
6:15 CLIP: Although directly minimizing cosine similarity would push our vectors 180 degrees apart on a single batch, overall in practice, we need CLIP to maximize the uniformity of concepts over the hypersphere it's operating on. For this reason, we animated these vectors as orthogonal-ish. See: https://proceedings.mlr.press/v119/wang20k/wang20k.pdf
Per Chenyang Yuan: at 10:15, the blurry image that results when removing random noise in DDPM is probably due to a mismatch in noise levels when calling the denoiser. When the denoiser is called on x_{t-1} during DDPM sampling, it is expected to have a certain noise level (let's call it sigma_{t-1}). If you generate x_{t-1} from x_t without adding noise, then the noise present in x_{t-1} is always smaller than sigma_{t-1}. This causes the denoiser to remove too much noise, thus pointing towards the mean of the dataset.
The text conditioning input to stable diffusion is not the 512-dim text embedding vector, but the output of the layer before that, [with dimension 77x512](https://stackoverflow.com/a/79243065)
For the vectors at 31:40 - Some implementations use f(x, t, cat) + alpha(f(x, t, cat) - f(x, t)), and some that do f(x, t) + alpha(f(x, t, cat) - f(x, t)), where an alpha value of 1 corresponds to no guidance. I chose the second format here to keep things simpler.
At 30:30, the unconditional t=1 vector field looks a bit different from what it did at the 17:15 mark. This is the result of different models trained for different parts of the video, and likely a result of different random initializations.
Premium Beat Music ID: EEDYZ3FP44YX8OWT
Using GitHub Spark to reverse engineer GitHub Spark
GitHub Spark was released in public preview yesterday. It’s GitHub’s implementation of the prompt-to-app pattern also seen in products like Claude Artifacts, Lovable, Vercel v0, Val Town Townie and Fly.io’s …
VibeTunnel: Turn Any Browser into Your Mac's Terminal | Peter Steinberger
We built a browser-based terminal controller in one day using Claude Code, named pipes, and Xterm.js. No SSH needed, just open your browser and start typing. Check and command your agents on the go!
Announcing Toad—a universal UI for agentic coding in the terminal
Will McGugan is building his own take on a terminal coding assistant, in the style of Claude Code and Gemini CLI, using his Textual Python library as the display layer. …
DSPy Tutorial | Build AI Agents with Python (Fundamentals)
Complete introduction to the simplest, most efficient, and yet most powerful way I’ve found to create AI agents, AI workflows, and AI programs in Python. Instead of manual prompting, we use automatic prompt optimization with DSPy and its concept of signatures.
Timestamps / Outline:
00:00 How to Call LLMs from Python, the Simple Way
0:21 Declare Your First AI Program (in 1 LOC)
2:24 Setting Up Your Large Language Model Backend
6:10 Program 2: Processing Images
9:14 Deeper Dive into Signatures
14:01 Program 3: Processing Entities from Paragraphs
19:19 Fetching text from wikipedia with Attachments
20:39 Setting Up a DataFrame
22:22 Apply Gemini Flash lite to each paragraph
23:02 Creating a Synthetic Gold Set
24:35 Quick Baseline Evaluation
25:11 Creating DSPy Examples
25:55 Evaluation Metric
26:10 Prompt Optimization with DSPy
29:10 Final Evaluation
Follow Max:
Twitter: [https://x.com/MaximeRivest](https://x.com/MaximeRivest)
GitHub: [https://github.com/MaximeRivest](https://github.com/MaximeRivest)
Links to Relevant Repositories:
Attachments: [https://github.com/MaximeRivest/attachments](https://github.com/MaximeRivest/attachments)
DSPy: [https://github.com/stanfordnlp/dspy](https://github.com/stanfordnlp/dspy)
FunnyDSPy: [https://github.com/MaximeRivest/funnydspy](https://github.com/MaximeRivest/funnydspy)
Docs:
[https://dspy.ai/](https://dspy.ai/)
[https://maximerivest.github.io/attachments/](https://maximerivest.github.io/attachments/)
If you’re new to my channel, my name is Maxime Rivest. I’m an Applied AI Engineer and Data Engineer. I like to educate people on the best tools in Data Analytics and AI Engineering.
Max
Will McGugan may no longer be running a commercial company around Textual, but that hasn't stopped his progress on the open source project. He recently released v4 of his Python …
Everyone is talking about agents, and right after that, they’re talking about agent-to-agent communications. Not surprisingly, various nascent, competing protocols are popping up to handle it.
But maybe all we need is MCP — the OG of GenAI communication protocols (it's from way back in 2024!).
Last year, Jason Liu gave the second most watched AIE talk — “Pydantic is all you need”.
This year, I (the creator of Pydantic) am continuing the tradition by arguing that MCP might be all we need for agent-to-agent communications.
What I’ll cover:
- Misusing Common Patterns: MCP was designed for desktop/IDE applications like Claude Code and Cursor. How can we adapt MCP for autonomous agents?
- Many Common Problems: MCP is great, but what can go wrong? How can you work around it? Can the protocol be extended to solve these issues?
- Monitoring Complex Phenomena: How does observability work (and not work) with MCP?
- Multiple Competing Protocols: A quick run-through of other agent communication protocols like A2A and AGNTCY, and probably a few more by June 😴
- Massive Crustaceans Party: What might success look like if everything goes to plan?
---related links---
https://x.com/samuel_colvin
https://www.linkedin.com/in/samuel-colvin/
https://github.com/samuelcolvin
https://pydantic.dev/
Timestamps
00:00:00 - Introduction: Speaker Samuel Colvin introduces himself as the creator of Pydantic.
00:00:42 - Pydantic Ecosystem: Introduction to Pydantic the company, the Pydantic AI agent framework, and the Logfire observability platform.
00:01:18 - Talk Thesis: Explaining the title "MCP is all you need" and the main argument that MCP simplifies agent communication.
00:02:05 - MCP's Focus: Clarifying that the talk focuses on MCP for autonomous agents and custom code, not its original desktop automation use case.
00:02:48 - Tool Calling Primitive: Highlighting that "tool calling" is the most relevant MCP primitive for this context.
00:03:10 - MCP vs. OpenAPI: Listing the advantages MCP has over a simple OpenAPI specification for tool calls.
00:03:21 - Feature 1: Dynamic Tools: Tools can appear and disappear based on server state.
00:03:26 - Feature 2: Streaming Logs: The ability to return log data to the user while a tool is still executing.
00:03:33 - Feature 3: Sampling: A mechanism for a tool (server) to request an LLM call back through the agent (client).
00:04:01 - MCP Architecture Diagram: Visualizing the basic agent-to-tool communication flow.
00:04:43 - Complex Architecture: Discussing scenarios where tools are themselves agents that need LLM access.
00:05:24 - Explaining Sampling: Detailing how sampling solves the problem of every agent needing its own LLM by allowing tools to "piggyback" on the client's LLM access.
00:06:42 - Pydantic AI's Role in Sampling: How the Pydantic AI library supports sampling on both the client and server side.
00:07:10 - Demo Start: Beginning the demonstration of a research agent that uses an MCP tool to query BigQuery.
00:08:23 - Code Walkthrough: Validation: Showing how Pydantic is used for output validation and automatic retries (model_retry).
00:09:00 - Code Walkthrough: Context Logging: Demonstrating the use of mcp_context.log to send progress updates back to the client.
00:10:51 - MCP Server Setup: Showing the code for setting up an MCP server using fast_mcp.
00:11:54 - Design Pattern: Inference Inside the Tool: Explaining the benefit of having the tool perform its own LLM inference to reduce the context burden on the main agent.
00:12:27 - Main Application Code: Reviewing the client-side code that defines the agent and registers the MCP tool.
00:13:16 - Observability with Logfire: Switching to the Logfire UI to trace the execution of the agent's query.
00:14:09 - Observing Sampling in Action: Pointing out the specific span in the trace that shows the tool making an LLM call back through the client via sampling.
00:14:48 - Inspecting the SQL Query: Showing how the observability tool can be used to see the exact SQL query that was generated by the internal agent.
00:15:15 - Conclusion: Final summary of the talk's points.
Context Engineering for AI Agents: Lessons from Building Manus
This post shares the local optima Manus arrived at through our own "SGD". If you're building your own AI agent, we hope these principles help you converge faster.
The Hidden Metric That Determines AI Product Success
Co-authored by Assaf Elovic and Harrison Chase. You can also find a version of this post published on Assaf's Medium.
Why do some AI products explode in adoption while others struggle to gain traction? After a decade of building AI products and watching hundreds of launches across the industry, we’
Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone
This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex …
One of my favorite AI dev products today is Full Line Code Completion in PyCharm (bundled with the IDE since late 2023). It’s extremely well-thought out,...
Context Engineering: Isaac Miller on Context Engineering with DSPy
Context engineering is rising in popularity because prompting alone isn't enough—we're still figuring out how to build reliable AI systems. From extracting s...
Proposal & discussion of how default mode networks for LLMs are an example of missing capabilities for search and novelty in contemporary AI systems.
The DSPy OSS team at Databricks and beyond is excited to present DSPy 3.0, targeted for release close to DAIS 2025. We will present what DSPy is and how it evolved over the past year. We will discuss greatly improved prompt optimization and finetuning/RL capabilities, improved productionization and observability via thorough and native integration with MLflow, and lessons from usage of DSPy in various Databricks R&D and professional services contexts.
Talk By: Krista Opsahl-Ong, Research Engineer, Databricks ; Omar Khattab, Research Scientist, Databricks
Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning
Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence
See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free
Chinese AI startup Moonshot releases open-source Kimi K2 model that outperforms OpenAI and Anthropic on coding tasks with breakthrough agentic capabilities and competitive pricing.
Lena Shakurova - Making LLMs reliable - A practical framework | PyData London 25
www.pydata.orgMaking LLMs reliable: A practical framework for productionLLM outputs are non-deterministic, making it difficult to ensure reliability in produ...