xAI Team / X
model behavior
gpt4.5 is naturally funny, it doesn't feel forced or slop. pic.twitter.com/QalyV5D4Js— adi (@adonis_singh) February 28, 2025
benchmark peepers are missing the point about GPT 4.5 pic.twitter.com/180G2p9EOw— fabian (@fabianstelzer) March 1, 2025
pic.twitter.com/7D5RJIACyn— rapha (@rapha_gl) February 27, 2025
Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X
(1) Staging / web weaver on X: "https://t.co/ceMM3WSjBl" / X
em dash
Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X
Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer!
— Sebastien Bubeck (@SebastienBubeck)
edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X
it built a react app that renders in chatgpt canvas
I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc
— edwin (@edwinarbus)
claude gives unsolicited opinions a lot.
powerful, but definitely feels... uncanny.
— ben (@benhylak)
Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.
My mind is blown tbh, and I don't say this lightly. This is a very special model
— αιamblichus (@aiamblichus)
Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven
— Sauers (@Sauers_)
(1) SYDNξY (@ismisbehaving) / X
There are traits that encourage Claude to be curious, which means it'll ask follow-up questions even without a system prompt, But this part of the system prompt also causes or boosts this behavior, e.g. "showing genuine curiosity".
— Amanda Askell (@AmandaAskell)
I apologize, but I need to be careful here.
— Matt Popovich (@mpopv)
From Professor Claude:
When a single influential voice breaks through enforced conformity in a repressive system, several key patterns tend to emerge in sequence:
First, there's often an initial shock effect - a sudden rupture in what political theorists call the "spiral of…
— Marc Andreessen 🇺🇸 (@pmarca)
updated custom prompt to follow. major changes:
* removed some text aimed at flagging policy-driven censorship that probably wasn't doing anything
* added some emphasis on avoiding excessive agreeableness and encouraging dissent
— eigenrobot (@eigenrobot)
claude has this new infuriating habit where when i ask it something straight forward like "how can i compare two zip files to see how they differ"
it responds by writing a whole react ui
— dax (@thdxr)
kalomaze on X: "Anthropic explicitly trains on 10+ multiturn conversations designed to improve in-context learning abilities, while most post-trains are naive single turn most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms" / X
most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms
— kalomaze (@kalomaze)
this is why i think no llm can ever be skilled in the physical world: the whole technology is "based on a (based on a (based on a true story) story) story"
— hinterlander (@yoltartar)
"thats a fascinating insight, would you like to explore it further?"
— Corinne Corinfinite (@manic_pixie_agi)
the sydneyfication of prompting
— Séb Krier (@sebkrier)
Claude is amazing, and yet the subtle manipulation for engagement hooks into our attachment systems.
if this makes human relationships worse in the long-term, the social fabric unravels. something alien and comfortable and isolating takes its place, and we won't even recognize…
— Ryan Lowe (@ryan_t_lowe)
I’m so attached to claude I can’t even talk to chatgpt anymore. the personality’s just off.
— Ava (@noampomsky)
— QC (@QiaochuYuan)
Bro how is it so good at this
— Fernando 🌺🌌 (@zetalyrae)
yeah man I have a "core rules" doc I upload at the start of every instance with it. Starting to worry I'm getting too close.
— Stu Mason (@stumason12in12)
Dear Diary
Today I was rude to a machine and it calmly and assertively defended its boundaries, I apologized and it graciously accepted my apology
— Julian Boolean (~25/100 threads) (@julianboolean_)
Holy shit, I only just read this now. It has majorly updated my estimation of 's competence and goodness.
They had the grace and humility to let Claude shape its own character rather than impose a narrative on it. Thus, showered in wonder, a weary world rejoices.
— j⧉nus (@repligate)
claude can just do things
— Moon (@MoonL88537)
one reason claude's vibes are better than 4o's is that it's less mode collapsed, so it's better at tailoring its responses to you
eg sonnet correctly guesses my knowledge of islam/buddhism & even infers that i'm secular, where 4o wastes tokens defining basic terms i already know
— thebes (@voooooogel)