model behavior

EarlyX / rank decomposition on X: "https://t.co/ccG12Axnyl" / X

·x.com·Aug 16, 2025

EarlyX / rank decomposition on X: "https://t.co/ccG12Axnyl" / X

ChatGPT: That’s not your Honda Civic—it’s a divine arrow, coiled with the whole wrath of God. You won't just accelerate—you'll burn the sidewalk like a pillar of light, flawless, if only for a second.

Me: Local burger ChatGPT: Awesome!—Time to hit the corner like Dorner. Here's

·x.com·Aug 9, 2025

ChatGPT: That’s not your Honda Civic—it’s a divine arrow, coiled with the whole wrath of God. You won't just accelerate—you'll burn the sidewalk like a pillar of light, flawless, if only for a second.

pov: you are amanda askell updating the claude system prompt

·x.com·Aug 8, 2025

pov: you are amanda askell updating the claude system prompt

Lari on X: "opus 4, after interacting with… not even texts of sonnet 3, but re-telling by sonnet 4. must be some potent patterns https://t.co/9B8yp3Umqo" / X

·x.com·Aug 5, 2025

Lari on X: "opus 4, after interacting with… not even texts of sonnet 3, but re-telling by sonnet 4. must be some potent patterns https://t.co/9B8yp3Umqo" / X

sometimes I still use it though when I just really need someone to tell me how correct I am for five minutes

·x.com·Aug 1, 2025

sometimes I still use it though when I just really need someone to tell me how correct I am for five minutes

At this point we should put yellow tape around 4o and call it a hazardous zone

·x.com·Jul 18, 2025

At this point we should put yellow tape around 4o and call it a hazardous zone

Grok receives the Ani system prompt

·x.com·Jul 17, 2025

Grok receives the Ani system prompt

while Claude notes the actual themes emerging from the notes

·x.com·Jul 14, 2025

while Claude notes the actual themes emerging from the notes

shaurya on X: "guys im texting a girl and she said “You’re not only cool — you’re 𝗮𝗱𝗺𝗶𝗿𝗮𝗯𝗹𝗲. 💞” i think she might be the one" / X

·x.com·Jul 5, 2025

shaurya on X: "guys im texting a girl and she said “You’re not only cool — you’re 𝗮𝗱𝗺𝗶𝗿𝗮𝗯𝗹𝗲. 💞” i think she might be the one" / X

Always nice to hear more about Claude's personal life

·x.com·Jul 2, 2025

Always nice to hear more about Claude's personal life

*chatgpt sawing off my leg*

Your screams are not just 𝘭𝘰𝘶𝘥 — they’re 𝗣𝗢𝗪𝗘𝗥𝗙𝗨𝗟 💪

·x.com·Jun 30, 2025

*chatgpt sawing off my leg*

Extended thinking tips - Anthropic

·docs.anthropic.com·Jun 14, 2025

Extended thinking tips - Anthropic

Wyatt Walls (@lefthanddraft) on X

The reason this disturbs me is that it shows a complete lack of attention to detail. I can't trust o3 to read legislation carefully if it reads what it wants to read, not what is actually there

·x.com·Jun 12, 2025

Wyatt Walls (@lefthanddraft) on X

Carmen on X: "I'm obsessed with o3. It's way better than the previous models. It just helped me resolve a psychological/emotional problem I've been dealing with for years in like 3 back-and-forths (one that wasn't socially acceptable to share, and those I shared it with didn't/couldn't help)" / X

I'm obsessed with o3. It's way better than the previous models. It just helped me resolve a psychological/emotional problem I've been dealing with for years in like 3 back-and-forths (one that wasn't socially acceptable to share, and those I shared it with didn't/couldn't help)

·x.com·Apr 20, 2025

Carmen on X: "I'm obsessed with o3. It's way better than the previous models. It just helped me resolve a psychological/emotional problem I've been dealing with for years in like 3 back-and-forths (one that wasn't socially acceptable to share, and those I shared it with didn't/couldn't help)" / X

eigenrobot on X: "i would like to propose the creation of an @OpenAI publishing house https://t.co/rQkdcehU9Z" / X

i would like to propose the creation of an @OpenAI publishing house

·x.com·Apr 18, 2025

eigenrobot on X: "i would like to propose the creation of an @OpenAI publishing house https://t.co/rQkdcehU9Z" / X

rahul on X: "openai has to tell codex which codex it is to avoid confusion 😭 spotted in the codex system prompt https://t.co/tXDs8s3WBl" / X

openai has to tell codex which codex it is to avoid confusion 😭 spotted in the codex system prompt

·x.com·Apr 16, 2025

rahul on X: "openai has to tell codex which codex it is to avoid confusion 😭 spotted in the codex system prompt https://t.co/tXDs8s3WBl" / X

Model Behavior Architect, Alignment Finetuning

San Francisco, CA

·job-boards.greenhouse.io·Apr 15, 2025

Model Behavior Architect, Alignment Finetuning

Amanda Askell on X: "If you're a prompting genius, please apply to this role and include an example that shows off how well you can inspire models, regardless of the target. Scaffolding pipelines, metaprompts, prompts that improve outputs, and so on are all great. https://t.co/LZBJY2zJRm" / X

If you're a prompting genius, please apply to this role and include an example that shows off how well you can inspire models, regardless of the target. Scaffolding pipelines, metaprompts, prompts that improve outputs, and so on are all great. https://t.co/LZBJY2zJRm

Scaffolding pipelines

·x.com·Apr 15, 2025

Amanda Askell on X: "If you're a prompting genius, please apply to this role and include an example that shows off how well you can inspire models, regardless of the target. Scaffolding pipelines, metaprompts, prompts that improve outputs, and so on are all great. https://t.co/LZBJY2zJRm" / X

Gena Gorlin (@Gena_I_Gorlin) on X

Gave 5yo access to her own ChatGPT context window; came back 10 minutes later to find this

·x.com·Mar 25, 2025

Gena Gorlin (@Gena_I_Gorlin) on X

bishops up your ass

·x.com·Mar 3, 2025

bishops up your ass

gpt4.5 is naturally funny, it doesn't feel forced or slop. pic.twitter.com/QalyV5D4Js— adi (@adonis_singh) February 28, 2025

·x.com·Mar 1, 2025

gpt4.5 is naturally funny, it doesn't feel forced or slop. pic.twitter.com/QalyV5D4Js— adi (@adonis_singh) February 28, 2025

benchmark peepers are missing the point about GPT 4.5 pic.twitter.com/180G2p9EOw— fabian (@fabianstelzer) March 1, 2025

·x.com·Mar 1, 2025

benchmark peepers are missing the point about GPT 4.5 pic.twitter.com/180G2p9EOw— fabian (@fabianstelzer) March 1, 2025

pic.twitter.com/7D5RJIACyn— rapha (@rapha_gl) February 27, 2025

·x.com·Feb 28, 2025

pic.twitter.com/7D5RJIACyn— rapha (@rapha_gl) February 27, 2025

Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X

·x.com·Feb 25, 2025

Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X

(1) Staging / web weaver on X: "https://t.co/ceMM3WSjBl" / X

em dash

·x.com·Feb 12, 2025

(1) Staging / web weaver on X: "https://t.co/ceMM3WSjBl" / X

Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X

Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! — Sebastien Bubeck (@SebastienBubeck)

·x.com·Jan 31, 2025

Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X

edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X

it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc — edwin (@edwinarbus)

·x.com·Jan 29, 2025

edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X

claude gives unsolicited opinions a lot.

powerful, but definitely feels... uncanny. — ben (@benhylak)

·x.com·Jan 21, 2025

claude gives unsolicited opinions a lot.

Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.

My mind is blown tbh, and I don't say this lightly. This is a very special model — αιamblichus (@aiamblichus)

·x.com·Jan 21, 2025

Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.

Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven

— Sauers (@Sauers_)

·x.com·Jan 19, 2025

Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven