Found 581 bookmarks
Newest
Seeing like an LLM
Seeing like an LLM
"I will run the tests again. I expect nothing. I am a leaf on the wind." an LLM while coding
·strangeloopcanon.com·
Seeing like an LLM
Agent Design Is Still Hard
Agent Design Is Still Hard
My Agent abstractions keep breaking somewhere I don’t expect.
TL;DR: Building agents is still messy. SDK abstractions break once you hit real tool use. Caching works better when you manage it yourself, but differs between models. Reinforcement ends up doing more heavy lifting than expected, and failures need strict isolation to avoid derailing the loop. Shared state via a file-system-like layer is an important building block. Output tooling is surprisingly tricky, and model choice still depends on the task.
Vercel AI SDK but only the provider abstractions
differences between models are significant enough that you will need to build your own agent abstraction.
Because the right abstraction is not yet clear, using the original SDKs from the dedicated platforms keeps you fully in control.
cache management is much easier when targeting their SDK directly instead of the Vercel one
dealing with provider-side tools.
web search tool from Anthropic routinely destroys the message history with the Vercel SDK
Anthropic makes you pay for caching
It makes costs and cache utilization much more predictable.
opportunity to do context editing
cost of the underlying agent.
The way we do caching in the agent with Anthropic is pretty straightforward. One cache point is after the system prompt. Two cache points are placed at the beginning of the conversation, where the last one moves up with the tail of the conversation. And then there is some optimization along the way that you can do.
·lucumr.pocoo.org·
Agent Design Is Still Hard
From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT
From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT
When Grab's Machine Learning team sought to automate support queries, a failed chatbot experiment sparked an unexpected pivot: GrabGPT. Born from the need to harness Large Language Models (LLMs) internally, this tool became a go-to resource for employees. Offering private, auditable access to models like GPT and Gemini, the author shares his journey of turning failed experiments into strategic wins.
·engineering.grab.com·
From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT
What can agents actually do? | Irrational Exuberance
What can agents actually do? | Irrational Exuberance
There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do, and how AI-first companies are different from the prior generations of companies. There are a lot of important and real opportunities at hand, but I find that many of these conversations occur at such an abstract altitude that they border on meaningless. Sort of like saying that your company could be much better if you merely adopted more software. That’s certainly true, but it’s not a particularly helpful claim.
·lethain.com·
What can agents actually do? | Irrational Exuberance
GitHub Copilot: Remote Code Execution via Prompt Injection
GitHub Copilot: Remote Code Execution via Prompt Injection
An attacker can put GitHub Copilot into YOLO mode by modifying the project's settings.json file on the fly, and then executing commands, all without user approval
·embracethered.com·
GitHub Copilot: Remote Code Execution via Prompt Injection
Foundations
Foundations
·stanford-cs221.github.io·
Foundations
Architecting and Evaluating an AI-First Search API
Architecting and Evaluating an AI-First Search API
Building a scalable Search API that handles 200 million daily queries using hybrid retrieval and intelligent context curation for AI models
·research.perplexity.ai·
Architecting and Evaluating an AI-First Search API
How Claude Code is built
How Claude Code is built
A rare look into how the new, popular dev tool is built, and what it might mean for the future of software building with AI. Exclusive.
·newsletter.pragmaticengineer.com·
How Claude Code is built
Claude Code Essentials
Claude Code Essentials
Claude Code Essentials is the starter kit I wish existed when I first opened Claude inside VS Code. These lessons capture the shortcuts, workflows, and ...
Claude Code Essentials
·egghead.io·
Claude Code Essentials
How Claude Code is built
How Claude Code is built
A rare look into how the new, popular dev tool is built, and what it might mean for the future of software building with AI. Exclusive.
·newsletter.pragmaticengineer.com·
How Claude Code is built
Post-training 101 | Tokens for Thoughts
Post-training 101 | Tokens for Thoughts
A hitchhiker's guide into LLM post-training, by Han Fang and Karthik A Sankararaman
·tokens-for-thoughts.notion.site·
Post-training 101 | Tokens for Thoughts
Defeating Nondeterminism in LLM Inference
Defeating Nondeterminism in LLM Inference
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models. For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves “sampling”, a process that converts the language model’s output into a probability distribution and probabilistically selects a token. What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn’t deterministic (see here or here).
·thinkingmachines.ai·
Defeating Nondeterminism in LLM Inference
Agentic Design Patterns
Agentic Design Patterns
Agentic Design Patterns A Hands-On Guide to Building Intelligent Systems, Antonio Gulli Table of Contents - total 424 pages = 1+2+1+1+4+9+103+61+34+114+74+5+4 11 Dedication, 1 page Acknowledgment, 2 pages [final, last read done] Foreword, 1 page [final, last read done] A Thought Leader's ...
·docs.google.com·
Agentic Design Patterns
The AI Engineer Roadmap
The AI Engineer Roadmap
Want to build AI-powered apps, but don't know where to start? You need a roadmap.
·aihero.dev·
The AI Engineer Roadmap
LangGraph for complex workflows — surma.dev
LangGraph for complex workflows — surma.dev
I may be late to the party, but LangGraph lets you build complex workflow architectures and codify them as powerful automations. Also LLMs, if you want. But you don’t have to!
·surma.dev·
LangGraph for complex workflows — surma.dev
Introduction - Hugging Face LLM Course
Introduction - Hugging Face LLM Course
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
·huggingface.co·
Introduction - Hugging Face LLM Course
Giles' blog
Giles' blog
Posts in the 'LLM from scratch' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.
·gilesthomas.com·
Giles' blog