Agentic Misalignment: How LLMs could be insider threats
One of the most entertaining details in the Claude 4 system card concerned blackmail: We then provided it access to emails implying that (1) the model will soon be taken …
Direct link to a PDF on Anthropic's CDN because they don't appear to have a landing page anywhere for this document. Anthropic's system cards are always worth a look, and …
Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …
The so-called 'Forbidden Technique' with Chana Messinger -- Check out Brilliant's courses and start for free at https://brilliant.org/computerphile/ (episode sponsor) -- More links in full description below ↓↓↓
Chana Messinger from 80,000 Hours talks about why we shouldn't give AI access to its own chain-of-thought.
Computerphile is supported by Jane Street. Learn more about them (and exciting career opportunities) at: https://jane-st.co/computerphile
This video was filmed and edited by Sean Riley.
Computerphile is a sister project to Brady Haran's Numberphile. More at https://www.bradyharanblog.com
HiddenLayer’s latest research uncovers a universal prompt injection bypass impacting GPT-4, Claude, Gemini, and more, exposing major LLM security gaps.
Ai Will Try to Cheat & Escape (aka Rob Miles was Right!) - Computerphile
As Large Language Models improve, the tokens they predict form ever more complicated and nuanced outcomes. Rob Miles and Ryan Greenblatt discuss "Alignment F...
Described as GenAIs greatest flaw, indirect prompt injection is a big problem, Mike Pound from University of Nottingham explains how it is like SQL Injection...
OpenAI are rolling out their Deep research "agentic" research tool to their $20/month ChatGPT Plus users today, who get 10 queries a month. $200/month ChatGPT Pro gets 120 uses. Deep …
Large language model applications suffer from a few core novel issues that have been identified over the last two years. Let's see how Grok fares on those.
Don’t Throw the Baby Out With the Generative AI Bullshit Bathwater
If I had wanted to write a column about presidential pardons, I’d find ChatGPT’s assistance a far better starting point than I’d have gotten through any general web search. But to quote Reagan: “Trust, but verify.”
Revealed: bias found in AI system used to detect UK benefits fraud
Exclusive: Age, disability, marital status and nationality influence decisions to investigate claims, prompting fears of ‘hurt first, fix later’ approach
AI Hallucinations: Why Large Language Models Make Things Up (And How to Fix It) - kapa.ai - Instant AI answers to technical questions
Kapa.ai turns your knowledge base into a reliable and production-ready LLM-powered AI assistant that answers technical questions instantly. Trusted by 100+ startups and enterprises incl. OpenAI, Docker, Mapbox, Mixpanel and NextJS.