AI/ML

2208 bookmarks

Custom sorting

MCP is all you need — Samuel Colvin, Pydantic

Everyone is talking about agents, and right after that, they’re talking about agent-to-agent communications. Not surprisingly, various nascent, competing protocols are popping up to handle it. But maybe all we need is MCP — the OG of GenAI communication protocols (it's from way back in 2024!). Last year, Jason Liu gave the second most watched AIE talk — “Pydantic is all you need”. This year, I (the creator of Pydantic) am continuing the tradition by arguing that MCP might be all we need for agent-to-agent communications. What I’ll cover: - Misusing Common Patterns: MCP was designed for desktop/IDE applications like Claude Code and Cursor. How can we adapt MCP for autonomous agents? - Many Common Problems: MCP is great, but what can go wrong? How can you work around it? Can the protocol be extended to solve these issues? - Monitoring Complex Phenomena: How does observability work (and not work) with MCP? - Multiple Competing Protocols: A quick run-through of other agent communication protocols like A2A and AGNTCY, and probably a few more by June 😴 - Massive Crustaceans Party: What might success look like if everything goes to plan? ---related links--- https://x.com/samuel_colvin https://www.linkedin.com/in/samuel-colvin/ https://github.com/samuelcolvin https://pydantic.dev/ Timestamps 00:00:00 - Introduction: Speaker Samuel Colvin introduces himself as the creator of Pydantic. 00:00:42 - Pydantic Ecosystem: Introduction to Pydantic the company, the Pydantic AI agent framework, and the Logfire observability platform. 00:01:18 - Talk Thesis: Explaining the title "MCP is all you need" and the main argument that MCP simplifies agent communication. 00:02:05 - MCP's Focus: Clarifying that the talk focuses on MCP for autonomous agents and custom code, not its original desktop automation use case. 00:02:48 - Tool Calling Primitive: Highlighting that "tool calling" is the most relevant MCP primitive for this context. 00:03:10 - MCP vs. OpenAPI: Listing the advantages MCP has over a simple OpenAPI specification for tool calls. 00:03:21 - Feature 1: Dynamic Tools: Tools can appear and disappear based on server state. 00:03:26 - Feature 2: Streaming Logs: The ability to return log data to the user while a tool is still executing. 00:03:33 - Feature 3: Sampling: A mechanism for a tool (server) to request an LLM call back through the agent (client). 00:04:01 - MCP Architecture Diagram: Visualizing the basic agent-to-tool communication flow. 00:04:43 - Complex Architecture: Discussing scenarios where tools are themselves agents that need LLM access. 00:05:24 - Explaining Sampling: Detailing how sampling solves the problem of every agent needing its own LLM by allowing tools to "piggyback" on the client's LLM access. 00:06:42 - Pydantic AI's Role in Sampling: How the Pydantic AI library supports sampling on both the client and server side. 00:07:10 - Demo Start: Beginning the demonstration of a research agent that uses an MCP tool to query BigQuery. 00:08:23 - Code Walkthrough: Validation: Showing how Pydantic is used for output validation and automatic retries (model_retry). 00:09:00 - Code Walkthrough: Context Logging: Demonstrating the use of mcp_context.log to send progress updates back to the client. 00:10:51 - MCP Server Setup: Showing the code for setting up an MCP server using fast_mcp. 00:11:54 - Design Pattern: Inference Inside the Tool: Explaining the benefit of having the tool perform its own LLM inference to reduce the context burden on the main agent. 00:12:27 - Main Application Code: Reviewing the client-side code that defines the agent and registers the MCP tool. 00:13:16 - Observability with Logfire: Switching to the Logfire UI to trace the execution of the agent's query. 00:14:09 - Observing Sampling in Action: Pointing out the specific span in the trace that shows the tool making an LLM call back through the client via sampling. 00:14:48 - Inspecting the SQL Query: Showing how the observability tool can be used to see the exact SQL query that was generated by the internal agent. 00:15:15 - Conclusion: Final summary of the talk's points.

·youtube.com·today at 11:34 AM

MCP is all you need — Samuel Colvin, Pydantic

Context Engineering for AI Agents: Lessons from Building Manus

This post shares the local optima Manus arrived at through our own "SGD". If you're building your own AI agent, we hope these principles help you converge faster.

#agent #architecture

·manus.im·yesterday at 12:03 PM

Context Engineering for AI Agents: Lessons from Building Manus

The Hidden Metric That Determines AI Product Success

Co-authored by Assaf Elovic and Harrison Chase. You can also find a version of this post published on Assaf's Medium. Why do some AI products explode in adoption while others struggle to gain traction? After a decade of building AI products and watching hundreds of launches across the industry, we’

·blog.langchain.com·yesterday at 11:09 AM

The Hidden Metric That Determines AI Product Success

A Prominent OpenAI Investor Appears to Be Suffering a ChatGPT-Related Mental Health Crisis, His Peers Say

Bedrock co-founder Geoff Lewis has posted increasingly troubling content on social media, drawing concern from friends in the industry.

#health #psychology

·futurism.com·yesterday at 10:44 AM

A Prominent OpenAI Investor Appears to Be Suffering a ChatGPT-Related Mental Health Crisis, His Peers Say

Kimi K2

While most people focused on Grok, there was another model release that got uniformly high praise: Kimi K2 from Moonshot.ai. …

·lesswrong.com·Jul 19, 2025

Kimi K2

Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone

This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex …

·simonwillison.net·Jul 18, 2025

Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone

My favorite use-case for AI is writing logs

One of my favorite AI dev products today is Full Line Code Completion in PyCharm (bundled with the IDE since late 2023). It’s extremely well-thought out,...

#programming #dev #tools #logging

·newsletter.vickiboykis.com·Jul 18, 2025

My favorite use-case for AI is writing logs

NameQuick - Rename files effortlessly on macOS

Transform your file management with intelligent, automated file renaming powered by AI.

#mac #app #Automation #macos

·namequick.app·Jul 17, 2025

NameQuick - Rename files effortlessly on macOS

Context Engineering: Isaac Miller on Context Engineering with DSPy

Context engineering is rising in popularity because prompting alone isn't enough—we're still figuring out how to build reliable AI systems. From extracting s...

#prompt #agent

·youtube.com·Jul 16, 2025

Context Engineering: Isaac Miller on Context Engineering with DSPy

LLM Inference Handbook

A practical handbook for engineers building, optimizing, scaling and operating LLM inference systems in production.

#learn #model training

·bentoml.com·Jul 16, 2025

LLM Inference Handbook

LLM Daydreaming

Proposal & discussion of how default mode networks for LLMs are an example of missing capabilities for search and novelty in contemporary AI systems.

·gwern.net·Jul 16, 2025

LLM Daydreaming

Kimi K2 and when "DeepSeek Moments" become normal

One "DeepSeek Moment" wasn't enough for us to wake up, hopefully we don't need a third.

#deepresearch #politics

·interconnects.ai·Jul 15, 2025

Kimi K2 and when "DeepSeek Moments" become normal

DSPy 3.0 — and DSPy at Databricks

The DSPy OSS team at Databricks and beyond is excited to present DSPy 3.0, targeted for release close to DAIS 2025. We will present what DSPy is and how it evolved over the past year. We will discuss greatly improved prompt optimization and finetuning/RL capabilities, improved productionization and observability via thorough and native integration with MLflow, and lessons from usage of DSPy in various Databricks R&D and professional services contexts. Talk By: Krista Opsahl-Ong, Research Engineer, Databricks ; Omar Khattab, Research Scientist, Databricks Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

#prompt #agent #fine tuning #programming

·youtube.com·Jul 14, 2025

DSPy 3.0 — and DSPy at Databricks

Switching to Claude Code + VSCode inside Docker

Why I decided to ditch Cursor and switch to running Claude Code in an isolated environment + diy guide!

#docker #IDE #vscode

·timsh.org·Jul 14, 2025

Switching to Claude Code + VSCode inside Docker

How o3 and Grok 4 Accidentally Vindicated Neurosymbolic AI

Neurosymbolic AI is quietly winning. Here’s what that means – and why it took so long

·garymarcus.substack.com·Jul 13, 2025

How o3 and Grok 4 Accidentally Vindicated Neurosymbolic AI

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free

Chinese AI startup Moonshot releases open-source Kimi K2 model that outperforms OpenAI and Anthropic on coding tasks with breakthrough agentic capabilities and competitive pricing.

·venturebeat.com·Jul 13, 2025

Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free

Scott Alexander’s Misleading Victory Lap

Whatever happened to steelmanning?

·garymarcus.substack.com·Jul 11, 2025

Scott Alexander’s Misleading Victory Lap

9to5Mac - Page 2 of 6831 - Apple News & Mac Rumors Breaking All Day

Apple News & Mac Rumors Breaking All Day

#health #medical #apple

·9to5mac.com·Jul 11, 2025

9to5Mac - Page 2 of 6831 - Apple News & Mac Rumors Breaking All Day

Lena Shakurova - Making LLMs reliable - A practical framework | PyData London 25

www.pydata.orgMaking LLMs reliable: A practical framework for productionLLM outputs are non-deterministic, making it difficult to ensure reliability in produ...

#testing

·youtube.com·Jul 10, 2025

Lena Shakurova - Making LLMs reliable - A practical framework | PyData London 25

Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

Large Language Models (LLMs) excel at understanding messy, real-world data, but integrating them into production systems remains challenging. Prompts can be unruly to write, vary by model and can be difficult to manage in the large context of a pipeline. In this session, we'll demonstrate incorporating LLMs into a geospatial conflation pipeline, using DSPy. We'll discuss how DSPy works under the covers and highlight the benefits it provides pipeline creators and managers. Talk By: Drew Breunig, Data Science Leader & Strategist, Overture Maps Foundation Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

#prompt #programming #code

·youtube.com·Jul 10, 2025

Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials

Contribute to FullStackRetrieval-com/RetrievalTutorials development by creating an account on GitHub.

#RAG #search #embedding

·github.com·Jul 9, 2025

RetrievalTutorials/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb at main · FullStackRetrieval-com/RetrievalTutorials

Evaluating Chunking Strategies for Retrieval | Chroma Research

#RAG #embedding #search #benchmark

·research.trychroma.com·Jul 9, 2025

Evaluating Chunking Strategies for Retrieval | Chroma Research

The BEST Way to Chunk Text for RAG

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/AdamLucek/ You’ll also get 20% off an annual premium subscript...

#RAG

·youtube.com·Jul 9, 2025

The BEST Way to Chunk Text for RAG

MonoQwen-Vision, the first visual document reranker - LightOn

We introduce MonoQwen2-VL-v0.1, the first visual document reranker to enhance the quality of the retrieved visual documents and take these pipelines to the next level. Reranking a small number of candidates with MonoQwen2-VL-v0.1 achieve top results on the ViDoRe leaderboard.

#vision #OCR

·lighton.ai·Jul 9, 2025

MonoQwen-Vision, the first visual document reranker - LightOn

Wharton Generative AI Labs Prompt Library | All Prompts

A tool that connects everyday work into one space. It gives you and your teams AI tools—search, writing, note-taking—inside an all-in-one, flexible workspace.

#prompt #writing

·hd3ns092ns.notion.site·Jul 9, 2025

Wharton Generative AI Labs Prompt Library | All Prompts

If it cites em dashes as proof, it came from a tool.

It's a safe bet that most of us have encountered the age-old admonition to "never judge a book by its cover" at some point in our lives. There is a deep wisdom in that advice---wisdom that seems to go completely out the window as soon as a certain type of person spots a certain type of punctuation.

#writing

·scottsmitelli.com·Jul 9, 2025

If it cites em dashes as proof, it came from a tool.

SmolLM3: smol, multilingual, long-context reasoner

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

#local model #model training

·huggingface.co·Jul 9, 2025

SmolLM3: smol, multilingual, long-context reasoner

Modern Information Retrieval Evaluation In The RAG Era

35% off our upcoming evals course: https://bit.ly/evals-aiModern IR Evaluation in the RAG Era w/ Nandan Thakur.Learn about future directions in RAG evaluatio...

#RAG #search

·youtube.com·Jul 8, 2025

Modern Information Retrieval Evaluation In The RAG Era

Optimizing RAG With Reasoning Models

Orion Weller presents new frontiers in information retrieval, focusing on how instruction following and reasoning capabilities from large language models can be integrated into retrieval systems. He introduces Promptriever, a fast embedder that can follow instructions, and Rank1, a powerful but slower reasoning reranker, demonstrating their ability to unlock new types of queries and significantly improve performance. 00:00 - New Frontiers in IR: Instruction Following and Reasoning 00:07 - Language Models (LLMs) & Their Key Capabilities 00:20 - Instruction Following 00:57 - Reasoning (Test-Time Compute) 01:41 - Bridging LLMs to Information Retrieval (IR) 01:52 - Evolution of Search (Google 1999 vs. Today) 02:17 - SearchGPT and Its Limitations 02:38 - Search Hasn't Changed Fundamentally 03:16 - Keyword Search (Traditional IR) 04:11 - Semantic Search (Modern IR) 04:38 - Instruction-Based Search (Proposed IR) 05:25 - Challenge: Reranking Alone Isn't Enough 06:02 - Prompt & Reasoning-Based Search (Advanced IR) 06:42 - What is an Instruction in IR? (Attributes & NLU) 07:31 - Call to Action: Prompt Retrievers Like LLMs 07:46 - Introducing Promptriever & Rank1 08:23 - Bi-Encoder vs. Cross-Encoder Architecture 09:10 - Can We Make Promptable Retrievers? (Promptriever's Idea) 10:08 - Generating Synthetic Instructions 10:34 - Promptriever Experimental Settings 11:20 - Promptriever Evaluation Data (FollowIR & InstructIR) 12:28 - Promptriever Instruction Following Results 12:59 - Promptriever Results: Out-of-Domain (OOD) with Generic Prompts 13:10 - Promptriever: Generic Prompt Examples 13:58 - Promptriever Performance with Generic Prompts (BEIR OOD) 14:44 - Promptriever: Robustness to Paraphrased Prompts 15:16 - Promptriever Summary 16:04 - Introducing Rank1 (Test-Time Compute for IR) 16:22 - Test-Time Compute in LLMs (O1 AIME example) 17:08 - What Does Test-Time Compute Look Like in IR? (Rank1 Example) 18:01 - Rank1 Evaluation Data (BRIGHT dataset) 18:50 - Rank1: Example of Model Reasoning (Leetcode Problem) 19:35 - Rank1 Results (BRIGHT, NevIR, mFollowIR) 20:15 - Rank1: Direct Comparison of Reasoning Chain 20:33 - Rank1: Finding New Relevant Documents (DL19/DL20) 21:05 - Re-judging Old Data (Explanation) 22:05 - Rank1 Summary 22:37 - The Goal: IR That Works Like LLMs 22:56 - Implications for Downstream Users 23:36 - Open Data/Open Source & Contact Info 23:45 - Q&A Session - Promptriever & Bi-Encoder 24:23 - Q&A Session - Operationalizing Promptriever 26:04 - Q&A Session - Cross-Encoder Integration 26:33 - Q&A Session - Meta-Search/Human-Provided Prompts 27:56 - Q&A Session - Rank1 vs. Frontier Reasoning Models 28:07 - Clarification on Rank1's Training Focus 28:30 - How Rank1 Compares to O3/Gemini 29:32 - Q&A Session - Fine-Tuning Rank1 30:19 - Q&A Session - Where to Find the Models 30:45 - Conclusion of Q&A

#RAG #search #nlp

·youtube.com·Jul 8, 2025

Optimizing RAG With Reasoning Models

Transformer by hand ✍️

#transformers #tutorial

·byhand.ai·Jul 8, 2025

Transformer by hand ✍️