Highlights from the Claude 4 system prompt
Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …
Reading these system prompts reminds me of the thing where any warning sign in the real world hints at somebody having done something extremely stupid in the past. A system prompt can often be interpreted as a detailed list of all of the things the model used to do before it was told not to do them.
because language models acquire biases and opinions throughout training—both intentionally and inadvertently—if we train them to say they have no opinions on political matters or values questions only when asked about them explicitly, we’re training them to imply they are more objective and unbiased than they are.
We want people to know that they’re interacting with a language model and not a person. But we also want them to know they’re interacting with an imperfect entity with its own biases and with a disposition towards some opinions more than others. Importantly, we want them to know they’re not interacting with an objective and infallible source of truth
I love “even if the person seems to have a good reason for asking for it”—clearly an attempt to get ahead of a whole bunch of potential jailbreaking attacks.
Claude responds in sentences or paragraphs and should not use lists in chit chat, in casual conversations, or in empathetic or advice-driven conversations. In casual conversation, it’s fine for Claude’s responses to be short, e.g. just a few sentences long.
That “should not use lists in chit chat” note hints at the fact that LLMs love to answer with lists of things!
There follows an entire paragraph about making lists, mostly again trying to discourage Claude from doing that so frequently