Public

Public

25 bookmarks
Custom sorting
How to think about agent frameworks
How to think about agent frameworks
TL;DR: * The hard part of building reliable agentic systems is making sure the LLM has the appropriate context at each step. This includes both controlling the exact content that goes into the LLM, as well as running the appropriate steps to generate relevant content. * Agentic systems consist of both
·blog.langchain.dev·
How to think about agent frameworks
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability by using a code executor to both validate proposed code reasoning tasks and verify answers, serving as an unified source of verifiable reward to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.
·arxiv.org·
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Why Moderna Merged Its Tech and HR Departments
Why Moderna Merged Its Tech and HR Departments
The vaccine maker, which has partnered with OpenAI since 2023, is rethinking how it does workforce planning thanks to the growing capabilities of AI and other tech
·wsj.com·
Why Moderna Merged Its Tech and HR Departments
Are all startups doomed? ☠️
Are all startups doomed? ☠️
New 20VC episode out now with Wharton Professor Ethan Mollick. Link in bio.—#20VC #HarryStebbings #EthanMollick #AI #ArtificialIntelligence #GenerativeAI #Ch...
·youtube.com·
Are all startups doomed? ☠️
AI agents: from co-pilot to autopilot
AI agents: from co-pilot to autopilot
An in-depth look at the hype and reality around “agentic AI” — the use of AI agents that perform tasks autonomously.
·on.ft.com·
AI agents: from co-pilot to autopilot
Big tech has a big Trump problem
Big tech has a big Trump problem
Silicon Valley’s stars are beset by trustbusters and the trade war
·economist.com·
Big tech has a big Trump problem