Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves “sampling”, a process that converts the language model’s output into a probability distribution and probabilistically selects a token.
What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn’t deterministic (see here or here).
A very common question I see about LLMs concerns why they can't be made to deliver the same response to the same prompt by setting a fixed random number seed. …
Will Amazon S3 Vectors Kill Vector Databases—or Save Them? - Zilliz blog
AWS S3 Vectors aims for 90% cost savings for vector storage. But will it kill vectordbs like Milvus? A deep dive into costs, limits, and the future of tiered storage.
GitHub - Varietyz/Disciplined-AI-Software-Development: This methodology provides a structured approach for collaborating with AI systems on software development projects. It addresses common issues like code bloat, architectural drift, and context dilution through systematic constraints and validation checkpoints.
This methodology provides a structured approach for collaborating with AI systems on software development projects. It addresses common issues like code bloat, architectural drift, and context dilu...
GPT-5 Thinking in ChatGPT (aka Research Goblin) is shockingly good at search
“Don’t use chatbots as search engines” was great advice for several years... until it wasn’t. I wrote about how good OpenAI’s o3 was at using its Bing-backed search tool back …
Learn how to create custom chat modes in VS Code for GitHub Copilot to enhance your workflow in large, complex projects with specialized AI configurations.
Introducing the Awesome GitHub Copilot Customizations repo - Microsoft for Developers
Today we’re excited to announce the launch of the Awesome GitHub Copilot Customizations repo! The Awesome Copilot repo is a community-driven resource with custom instructions, reusable prompts, and custom chat modes that helps you get consistent AI assistance. In other words, Awesome Copilot helps you get the most out of GitHub Copilot by letting you tailor it […]
Testing VLMs and LLMs for robotics w/ the Jetson Thor devkit
Exploring the Jetson Thor devkit w/ some local LLMs and VLMs.More info on the Jetson Thor Devkit: https://nvda.ws/45xIU4BNeural Networks from Scratch book: h...
The GPT-5 launch was uh, rough. A lot went wrong here, and I want to talk about what really happened...Thank you Kilo Code for sponsoring! Check them out at:...
What makes Claude Code so damn good (and how to recreate that magic in your agent)!?
Claude Code is the most delightful AI agent/workflow I have used so far. Not only does it make targeted edits or vibe coding throwaway tools less annoying, ...
too many model context protocol servers and LLM allocations on the dance floor
This blog post intends to be a definitive guide to context engineering fundamentals from the perspective of an engineer who builds commercial coding assistants and harnesses for a living.
Just two weeks ago, I was back over in San Francisco, and there was a big event on Model Context Protocol
too many model context protocol servers and LLM allocations on the dance floor
Useful reminder from Geoffrey Huntley of the infrequently discussed significant token cost of using MCP. Geoffrey estimate estimates that the usable context window something like Amp or Cursor is around …
AI Agents Need Data Integrity - Schneier on Security
Think of the Web as a digital territory with its own social contract. In 2014, Tim Berners-Lee called for a “Magna Carta for the Web” to restore the balance of power between individuals and institutions. This mirrors the original charter’s purpose: ensuring that those who occupy a territory have a meaningful stake in its governance. Web 3.0—the distributed, decentralized Web of tomorrow—is finally poised to change the Internet’s dynamic by returning ownership to data creators. This will change many things about what’s often described as the “CIA triad” of ...