From failure to success: The birth of GrabGPT, Grab’s internal ChatGPT
When Grab's Machine Learning team sought to automate support queries, a failed chatbot experiment sparked an unexpected pivot: GrabGPT. Born from the need to harness Large Language Models (LLMs) internally, this tool became a go-to resource for employees. Offering private, auditable access to models like GPT and Gemini, the author shares his journey of turning failed experiments into strategic wins.
What can agents actually do? | Irrational Exuberance
There’s a lot of excitement about what AI (specifically the latest wave of LLM-anchored AI) can do,
and how AI-first companies are different from the prior generations of companies.
There are a lot of important and real opportunities at hand, but I find that many of these conversations
occur at such an abstract altitude that they border on meaningless.
Sort of like saying that your company could be much better if you merely adopted more software. That’s certainly true,
but it’s not a particularly helpful claim.
GitHub Copilot: Remote Code Execution via Prompt Injection
An attacker can put GitHub Copilot into YOLO mode by modifying the project's settings.json file on the fly, and then executing commands, all without user approval
Claude Code Essentials is the starter kit I wish existed when I first opened Claude inside VS Code. These lessons capture the shortcuts, workflows, and ...
Reproducibility is a bedrock of scientific progress. However, it’s remarkably difficult to get reproducible results out of large language models.
For example, you might observe that asking ChatGPT the same question multiple times provides different results. This by itself is not surprising, since getting a result from a language model involves “sampling”, a process that converts the language model’s output into a probability distribution and probabilistically selects a token.
What might be more surprising is that even when we adjust the temperature down to 0This means that the LLM always chooses the highest probability token, which is called greedy sampling. (thus making the sampling theoretically deterministic), LLM APIs are still not deterministic in practice (see past discussions here, here, or here). Even when running inference on your own hardware with an OSS inference library like vLLM or SGLang, sampling still isn’t deterministic (see here or here).
Agentic Design Patterns A Hands-On Guide to Building Intelligent Systems, Antonio Gulli Table of Contents - total 424 pages = 1+2+1+1+4+9+103+61+34+114+74+5+4 11 Dedication, 1 page Acknowledgment, 2 pages [final, last read done] Foreword, 1 page [final, last read done] A Thought Leader's ...
CaMeL offers a promising new direction for mitigating prompt injection attacks
In the two and a half years that we’ve been talking about prompt injection attacks I’ve seen alarmingly little progress towards a robust solution. The new paper Defeating Prompt Injections …
I may be late to the party, but LangGraph lets you build complex workflow architectures and codify them as powerful automations. Also LLMs, if you want. But you don’t have to!
Posts in the 'LLM from scratch' category on Giles Thomas’s blog. Insights on AI, startups, software development, and technical projects, drawn from 30 years of experience.
Enhanced Agentic-RAG: What If Chatbots Could Deliver Near-Human Precision? | Uber Blog
Genie is Uber’s internal on-call copilot, designed to provide real-time support for thousands of queries across multiple help channels in Slack®. It enables users to receive prompt responses with proper citations from Uber’s internal documentation. It also improves the productivity of on-call engineers and subject matter experts (SMEs) by reducing the effort required to address common, ad-hoc queries. While Genie streamlines the development of an LLM-powered on-call Slack bot, ensuring the accuracy and relevance of its responses remains a significant challenge. This blog details our efforts to improve Genie’s answer quality to near-human precision, allowing SMEs to rely on it for most queries without concern over potential misinformation in the engineering security and privacy domain.