model behavior

33 bookmarks
Custom sorting
Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X
Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X
·x.com·
Sam Whitmore on X: "my vibe check for 3.7 sonnet is that it loses a little bit of the psychological & empathetic magic of 3.5 ... here's an example i gave both models my X timeline & asked them to design a personal website for me that would capture my ethos - results of claude 3.5 vs 3.7 below" / X
Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X
Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X
Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! — Sebastien Bubeck (@SebastienBubeck)
·x.com·
Sebastien Bubeck on X: "o3-mini is a remarkable model. Somehow it has *grokked arxiv* in a way that no other model on the planet has, turning it into a valuable research partner! Below is a deceitfully simple question that confuses *all* other models but where o3-mini gives an extremely useful answer! https://t.co/am5XI6aUOP" / X
edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X
edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X
it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc — edwin (@edwinarbus)
·x.com·
edwin on X: "I asked o1 to help me code the wii menu it built a react app that renders in chatgpt canvas I fed it a screenshot and it one-shotted the basic layout—even the striped background—then I kept on prompting to add animations, etc https://t.co/2tmt88V8I3" / X
Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.
Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.
My mind is blown tbh, and I don't say this lightly. This is a very special model — αιamblichus (@aiamblichus)
·x.com·
Whatever DeepSeek did, they somehow avoided the mode collapse that plagues other SOTA models. R1's imagination is wild even without any special prompting, and its use of language is rich and free.
Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven
Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven
— Sauers (@Sauers_)
·x.com·
Tried the same problem on Sonnet and o1 pro. Sonnet said "idk, show me the output of this debug command." I did, and Sonnet said "oh, it's clearly this. Run this and it will be fixed." (It worked.) o1 pro came up with a false hypothesis and kept sticking to it even when disproven
There are traits that encourage Claude to be curious, which means it'll ask follow-up questions even without a system prompt, But this part of the system prompt also causes or boosts this behavior, e.g. "showing genuine curiosity".
There are traits that encourage Claude to be curious, which means it'll ask follow-up questions even without a system prompt, But this part of the system prompt also causes or boosts this behavior, e.g. "showing genuine curiosity".
— Amanda Askell (@AmandaAskell)
·x.com·
There are traits that encourage Claude to be curious, which means it'll ask follow-up questions even without a system prompt, But this part of the system prompt also causes or boosts this behavior, e.g. "showing genuine curiosity".
From Professor Claude:
From Professor Claude:
When a single influential voice breaks through enforced conformity in a repressive system, several key patterns tend to emerge in sequence: First, there's often an initial shock effect - a sudden rupture in what political theorists call the "spiral of… — Marc Andreessen 🇺🇸 (@pmarca)
·x.com·
From Professor Claude:
updated custom prompt to follow. major changes:
updated custom prompt to follow. major changes:
* removed some text aimed at flagging policy-driven censorship that probably wasn't doing anything * added some emphasis on avoiding excessive agreeableness and encouraging dissent — eigenrobot (@eigenrobot)
·x.com·
updated custom prompt to follow. major changes:
kalomaze on X: "Anthropic explicitly trains on 10+ multiturn conversations designed to improve in-context learning abilities, while most post-trains are naive single turn most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms" / X
kalomaze on X: "Anthropic explicitly trains on 10+ multiturn conversations designed to improve in-context learning abilities, while most post-trains are naive single turn most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms" / X
most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms — kalomaze (@kalomaze)
·x.com·
kalomaze on X: "Anthropic explicitly trains on 10+ multiturn conversations designed to improve in-context learning abilities, while most post-trains are naive single turn most of the RL improvements they have are from smarter people defining the RL rewards, not necessarily smarter algorithms" / X
Dear Diary
Dear Diary
Today I was rude to a machine and it calmly and assertively defended its boundaries, I apologized and it graciously accepted my apology — Julian Boolean (~25/100 threads) (@julianboolean_)
·x.com·
Dear Diary