How does ChatGPT ‘think’? Psychology and neuroscience crack open AI large language models
Researchers are striving to reverse-engineer artificial intelligence and scan the ‘brains’ of LLMs to see what they are doing, how and why.
But with conventional software, someone with inside knowledge can usually deduce what’s going on,
worked for a dozen years — will have a good idea why. “Here’s what really terrifies me” about the current breed of artificial intelligence (AI), he says: “there is no such understanding”, even among the people building it.
Martin Wattenberg, a computer scientist at Harvard University in Cambridge, Massachusetts, says that understanding the behaviour of LLMs could even help us to grasp what goes on inside our own heads.
some say more is going on, including reasoning and other startlingly human-like abilities
The researchers described the model’s behaviour as role-playing — doing more than parroting but less than planning.
When they asked their LLM whether it consented to being shut down, they found it drew on several source materials with the theme of survival to compose a compelling response (see ‘Lust for life’).
trained an LLM from scratch to play the board game Othello,
The team successfully trained a smaller model to interpret the internal activations of the AI, and discovered that it had constructed an internal map of the discs based on the text descriptions of the gameplay2
Because chatbots can chat, some researchers interrogate their workings by simply asking the models to explain themselves. This approach resembles those used in human psychology. “
The researchers first intentionally biased their study models by, say, giving them a series of multiple-choice questions for which the answer was always option A. The team then asked a final test question. The models usually answered A — whether correct or not — but almost never said that they chose this response because the answer is usually A
“It’s a little weird to study [LLMs] the way we study humans,” Bau says. But although there are limits to the comparison, the behaviour of the two overlaps in surprising ways.
“It is nonsensical to say that an LLM has feelings,” Hagendorff says. “It is nonsensical to say that it is self-aware or that it has intentions. But I don’t think it is nonsensical to say that these machines are able to learn or to deceive.”