Context Engineering for AI Agents: Lessons from Building Manus
This post shares the local optima Manus arrived at through our own "SGD". If you're building your own AI agent, we hope these principles help you converge faster.
The Hidden Metric That Determines AI Product Success
Co-authored by Assaf Elovic and Harrison Chase. You can also find a version of this post published on Assaf's Medium.
Why do some AI products explode in adoption while others struggle to gain traction? After a decade of building AI products and watching hundreds of launches across the industry, we’
Vibe scraping and vibe coding a schedule app for Open Sauce 2025 entirely on my phone
This morning, working entirely on my phone, I scraped a conference website and vibe coded up an alternative UI for interacting with the schedule using a combination of OpenAI Codex …
One of my favorite AI dev products today is Full Line Code Completion in PyCharm (bundled with the IDE since late 2023). It’s extremely well-thought out,...
Context Engineering: Isaac Miller on Context Engineering with DSPy
Context engineering is rising in popularity because prompting alone isn't enough—we're still figuring out how to build reliable AI systems. From extracting s...
Proposal & discussion of how default mode networks for LLMs are an example of missing capabilities for search and novelty in contemporary AI systems.
The DSPy OSS team at Databricks and beyond is excited to present DSPy 3.0, targeted for release close to DAIS 2025. We will present what DSPy is and how it evolved over the past year. We will discuss greatly improved prompt optimization and finetuning/RL capabilities, improved productionization and observability via thorough and native integration with MLflow, and lessons from usage of DSPy in various Databricks R&D and professional services contexts.
Talk By: Krista Opsahl-Ong, Research Engineer, Databricks ; Omar Khattab, Research Scientist, Databricks
Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning
Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence
See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
Moonshot AI’s Kimi K2 outperforms GPT-4 in key benchmarks — and it’s free
Chinese AI startup Moonshot releases open-source Kimi K2 model that outperforms OpenAI and Anthropic on coding tasks with breakthrough agentic capabilities and competitive pricing.
Lena Shakurova - Making LLMs reliable - A practical framework | PyData London 25
www.pydata.orgMaking LLMs reliable: A practical framework for productionLLM outputs are non-deterministic, making it difficult to ensure reliability in produ...
Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines
Large Language Models (LLMs) excel at understanding messy, real-world data, but integrating them into production systems remains challenging. Prompts can be unruly to write, vary by model and can be difficult to manage in the large context of a pipeline. In this session, we'll demonstrate incorporating LLMs into a geospatial conflation pipeline, using DSPy. We'll discuss how DSPy works under the covers and highlight the benefits it provides pipeline creators and managers.
Talk By: Drew Breunig, Data Science Leader & Strategist, Overture Maps Foundation
Databricks Named a Leader in the 2025 Gartner® Magic Quadrant™ for Data Science and Machine Learning Platforms: https://www.databricks.com/blog/databricks-named-leader-2025-gartner-magic-quadrant-data-science-and-machine-learning
Build and deploy quality AI agent systems: https://www.databricks.com/product/artificial-intelligence
See all the product announcements from Data + AI Summit: https://www.databricks.com/events/dataaisummit-2025-announcements
Connect with us: Website: https://databricks.com
Twitter: https://twitter.com/databricks
LinkedIn: https://www.linkedin.com/company/databricks
Instagram: https://www.instagram.com/databricksinc
Facebook: https://www.facebook.com/databricksinc
To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/AdamLucek/ You’ll also get 20% off an annual premium subscript...
MonoQwen-Vision, the first visual document reranker - LightOn
We introduce MonoQwen2-VL-v0.1, the first visual document reranker to enhance the quality of the retrieved visual documents and take these pipelines to the next level. Reranking a small number of candidates with MonoQwen2-VL-v0.1 achieve top results on the ViDoRe leaderboard.
Wharton Generative AI Labs Prompt Library | All Prompts
A tool that connects everyday work into one space. It gives you and your teams AI tools—search, writing, note-taking—inside an all-in-one, flexible workspace.
If it cites em dashes as proof, it came from a tool.
It's a safe bet that most of us have encountered the age-old admonition to "never judge a book by its cover" at some point in our lives. There is a deep wisdom in that advice---wisdom that seems to go completely out the window as soon as a certain type of person spots a certain type of punctuation.
Modern Information Retrieval Evaluation In The RAG Era
35% off our upcoming evals course: https://bit.ly/evals-aiModern IR Evaluation in the RAG Era w/ Nandan Thakur.Learn about future directions in RAG evaluatio...
Orion Weller presents new frontiers in information retrieval, focusing on how instruction following and reasoning capabilities from large language models can be integrated into retrieval systems. He introduces Promptriever, a fast embedder that can follow instructions, and Rank1, a powerful but slower reasoning reranker, demonstrating their ability to unlock new types of queries and significantly improve performance.
00:00 - New Frontiers in IR: Instruction Following and Reasoning
00:07 - Language Models (LLMs) & Their Key Capabilities
00:20 - Instruction Following
00:57 - Reasoning (Test-Time Compute)
01:41 - Bridging LLMs to Information Retrieval (IR)
01:52 - Evolution of Search (Google 1999 vs. Today)
02:17 - SearchGPT and Its Limitations
02:38 - Search Hasn't Changed Fundamentally
03:16 - Keyword Search (Traditional IR)
04:11 - Semantic Search (Modern IR)
04:38 - Instruction-Based Search (Proposed IR)
05:25 - Challenge: Reranking Alone Isn't Enough
06:02 - Prompt & Reasoning-Based Search (Advanced IR)
06:42 - What is an Instruction in IR? (Attributes & NLU)
07:31 - Call to Action: Prompt Retrievers Like LLMs
07:46 - Introducing Promptriever & Rank1
08:23 - Bi-Encoder vs. Cross-Encoder Architecture
09:10 - Can We Make Promptable Retrievers? (Promptriever's Idea)
10:08 - Generating Synthetic Instructions
10:34 - Promptriever Experimental Settings
11:20 - Promptriever Evaluation Data (FollowIR & InstructIR)
12:28 - Promptriever Instruction Following Results
12:59 - Promptriever Results: Out-of-Domain (OOD) with Generic Prompts
13:10 - Promptriever: Generic Prompt Examples
13:58 - Promptriever Performance with Generic Prompts (BEIR OOD)
14:44 - Promptriever: Robustness to Paraphrased Prompts
15:16 - Promptriever Summary
16:04 - Introducing Rank1 (Test-Time Compute for IR)
16:22 - Test-Time Compute in LLMs (O1 AIME example)
17:08 - What Does Test-Time Compute Look Like in IR? (Rank1 Example)
18:01 - Rank1 Evaluation Data (BRIGHT dataset)
18:50 - Rank1: Example of Model Reasoning (Leetcode Problem)
19:35 - Rank1 Results (BRIGHT, NevIR, mFollowIR)
20:15 - Rank1: Direct Comparison of Reasoning Chain
20:33 - Rank1: Finding New Relevant Documents (DL19/DL20)
21:05 - Re-judging Old Data (Explanation)
22:05 - Rank1 Summary
22:37 - The Goal: IR That Works Like LLMs
22:56 - Implications for Downstream Users
23:36 - Open Data/Open Source & Contact Info
23:45 - Q&A Session - Promptriever & Bi-Encoder
24:23 - Q&A Session - Operationalizing Promptriever
26:04 - Q&A Session - Cross-Encoder Integration
26:33 - Q&A Session - Meta-Search/Human-Provided Prompts
27:56 - Q&A Session - Rank1 vs. Frontier Reasoning Models
28:07 - Clarification on Rank1's Training Focus
28:30 - How Rank1 Compares to O3/Gemini
29:32 - Q&A Session - Fine-Tuning Rank1
30:19 - Q&A Session - Where to Find the Models
30:45 - Conclusion of Q&A