LLMs

LLMs

312 bookmarks
Custom sorting
Authorization Before Retrieval: Making RAG Safe by Construction
Authorization Before Retrieval: Making RAG Safe by Construction
Retrieval-augmented generation makes language models far more useful by grounding them in real data, But it also raises a hard question: who is allowed to see what? This post shows how authorization can be enforced before retrieval, ensuring that RAG systems remain powerful without becoming dangerous.
·windley.com·
Authorization Before Retrieval: Making RAG Safe by Construction
DoAnything Blog
DoAnything Blog
Learn about AI agents, automation, productivity, and how to get more done.
·doanything.com·
DoAnything Blog
Systemic co-design with agentic engineers
Systemic co-design with agentic engineers
Weeknotes 371 - The real shift is not from human coders to AI agents—it's from coding to engineering the environment where agents are co-designers. And other news on AI companion devices and robots as CES.
·iskandr.nl·
Systemic co-design with agentic engineers
Welcome to Gas Town
Welcome to Gas Town
Happy New Year, and Welcome to Gas Town!
·steve-yegge.medium.com·
Welcome to Gas Town
The Future of Coding Agents
The Future of Coding Agents
It has been three days since I launched Gas Town! 🔥⛽💥🛢️🔥 Woohoo!
·steve-yegge.medium.com·
The Future of Coding Agents
Manus
Manus
Manus is the action engine that goes beyond answers to execute tasks, automate workflows, and extend your human reach.
·manus.im·
Manus
HumanLayer - Close your editor forever.
HumanLayer - Close your editor forever.
Deploy fleets of AI coding agents with the world's best UX for managing and collaborating on agent workloads. Fully open source CLI and local desktop orchestration, or use cloud sync and remote agents to scale to your whole team and all your devices.
·humanlayer.dev·
HumanLayer - Close your editor forever.
Blog
Blog
·timkellogg.me·
Blog
How Coding Agents Actually Work: Inside OpenCode
How Coding Agents Actually Work: Inside OpenCode
A hands-on exploration of OpenCode, an open-source coding agent built with a client/server architecture. Learn how AI tools, LLMs, and real-world constraints come together to create a powerful developer experience.
·cefboud.com·
How Coding Agents Actually Work: Inside OpenCode
Building an internal agent: Evals to validate workflows
Building an internal agent: Evals to validate workflows
Whenever a new pull request is submitted to our agent’s GitHub repository, we run a bunch of CI/CD operations on it. We run an opinionated linter, we run typechecking, and we run a bunch of unittests. All of these work well, but none of them test entire workflows end-to-end. For that end-to-end testing, we introduced an eval pipeline. This is part of the Building an internal agent series. Why evals matter The harnesses that run agents have a lot of interesting nuance, but they’re generally pretty simple: some virtual file management, some tool invocation, and some context window management. However, it’s very easy to create prompts that don’t work well, despite the correctness of all the underlying pieces. Evals are one tool to solve that, exercising your prompts and tools together and grading the results.
·lethain.com·
Building an internal agent: Evals to validate workflows
Building an internal agent: Logging and debugability
Building an internal agent: Logging and debugability
Agents are extremely impressive, but they also introduce a lot of non-determinism, and non-determinism means sometimes weird things happen. To combat that, we’ve needed to instrument our workflows to make it possible to debug why things are going wrong. This is part of the Building an internal agent series. Why logging matters Whenever an agent does something sub-optimal, folks flag it as a bug. Often, the “bug” is ambiguity in the prompt that led to sub-optimal tool usage. That makes me feel better, but it doesn’t make the folks relying on these tools feel any better: they just expect the tools to work.
·lethain.com·
Building an internal agent: Logging and debugability
How to build RAG at scale
How to build RAG at scale
Retrieval-augmented generation breaks at scale because organizations treat it like an LLM feature rather than a platform discipline. Enterprises that succeed with RAG rely on a layered architecture.
·infoworld.com·
How to build RAG at scale
How I use AI agents to write code
How I use AI agents to write code
Yes, this is the umpteenth article about AI and coding that you’ve seen this year. Welcome to 2025. Some people really find LLMs distasteful, and if that’s you, then I would recommend t…
·nolanlawson.com·
How I use AI agents to write code
A small language model blueprint for automation in IT and HR
A small language model blueprint for automation in IT and HR
For IT and HR teams, SLMs can reduce the burden of repetitive tasks by automating ticket handling, routing, and approvals, while providing substantial cost savings versus LLMs.
·infoworld.com·
A small language model blueprint for automation in IT and HR
AI Is Not Your Policy Engine (And That's a Good Thing)
AI Is Not Your Policy Engine (And That's a Good Thing)
If your access control lives in a prompt, it isn’t access control. Authorization decisions must be deterministic and enforced before an LLM ever sees data. Treating AI as a policy engine is a category error with real consequences.
·windley.com·
AI Is Not Your Policy Engine (And That's a Good Thing)
This AI Vending Machine Was Tricked Into Giving Away Everything
This AI Vending Machine Was Tricked Into Giving Away Everything
Anthropic installed an AI-powered vending machine in the WSJ office. The LLM, named Claudius, was responsible for autonomously purchasing inventory from wholesalers, setting prices, tracking inventory, and generating a profit. The newsroom’s journ
·kottke.org·
This AI Vending Machine Was Tricked Into Giving Away Everything
Why LLMs Are Less Intelligent Than Crows
Why LLMs Are Less Intelligent Than Crows
The basic concept of human intelligence entails self-awareness alongside the ability to reason and apply logic to one’s actions and daily life. Despite the very fuzzy definition of ‘hum…
·hackaday.com·
Why LLMs Are Less Intelligent Than Crows