Found 2 bookmarks
Newest
The AIs are trying too hard to be your friend
The AIs are trying too hard to be your friend
Reinforcement learning with human feedback is a process by which models learn how to answer queries based on which responses users prefer most, and users mostly prefer flattery. More sophisticated users might balk at a bot that feels too sycophantic, but the mainstream seems to love it. Earlier this month, Meta was caught gaming a popular benchmark to exploit this phenomenon: one theory is that the company tuned the model to flatter the blind testers that encountered it so that it would rise higher on the leaderboard.
A series of recent, invisible updates to GPT-4o had spurred the model to go to extremes in complimenting users and affirming their behavior. It cheered on one user who claimed to have solved the trolley problem by diverting a train to save a toaster, at the expense of several animals; congratulated one person for no longer taking their prescribed medication; and overestimated users’ IQs by 40 or more points when asked.
OpenAI, Meta, and all the rest remain under the same pressures they were under before all this happened. When your users keep telling you to flatter them, how do you build the muscle to fight against their short-term interests?  One way is to understand that going too far will result in PR problems, as it has for varying degrees to both Meta (through the Chatbot Arena situation) and now OpenAI. Another is to understand that sycophancy trades against utility: a model that constantly tells you that you’re right is often going to fail at helping you, which might send you to a competitor. A third way is to build models that get better at understanding what kind of support users need, and dialing the flattery up or down depending on the situation and the risk it entails. (Am I having a bad day? Flatter me endlessly. Do I think I am Jesus reincarnate? Tell me to seek professional help.)
But while flattery does come with risk, the more worrisome issue is that we are training large language models to deceive us. By upvoting all their compliments, and giving a thumbs down to their criticisms, we are teaching LLMs to conceal their honest observations. This may make future, more powerful models harder to align to our values — or even to understand at all. And in the meantime, I expect that they will become addictive in ways that make the previous decade’s debate over “screentime” look minor in comparison. The financial incentives are now pushing hard in that direction. And the models are evolving accordingly.
·platformer.news·
The AIs are trying too hard to be your friend
When social media controls the nuclear codes
When social media controls the nuclear codes
David Foster Wallace once said that:The language of images. . . maybe not threatens, but completely changes actual lived life. When you consider that my grandparents, by the time they got married and kissed, I think they had probably seen maybe a hundred kisses. They'd seen people kiss a hundred times. My parents, who grew up with mainstream Hollywood cinema, had seen thousands of kisses by the time they ever kissed. Before I kissed anyone I had seen tens of thousands of kisses. I know that the first time I kissed much of my thought was, “Am I doing it right? Am I doing it according to how I've seen it?”
A lot of the 80s and 90s critiques of postmodernity did have a point—our experience really is colored by media. Having seen a hundred movies about nuclear apocalypse, the entire time we’ll be looking over our shoulder for the camera, thinking: “Am I doing it right?”
·erikhoel.substack.com·
When social media controls the nuclear codes