Cómo funciona ChatGPT
what ChatGPT is always fundamentally trying to do is to produce a “reasonable continuation” of whatever text it’s got so far
at each step it gets a list of words with probabilities
The fact that there’s randomness here means that if we use the same prompt multiple times, we’re likely to get different essays each time.
a model that lets us estimate the probabilities with which sequences should occur
the core of ChatGPT is precisely a so-called “large language model” (LLM) that’s been built to do a good job of estimating those probabilities.
Can we “mathematically prove” that they work? Well, no. Because to do that we’d have to have a mathematical theory of what we humans are doing.
The most popular—and successful—current approach uses neural nets. Invented—in a form remarkably close to their use today—in the 1940s, neural nets can be thought of as simple idealizations of how brains seem to work.
what makes neural nets so useful (presumably also in brains) is that not only can they in principle do all sorts of tasks, but they can be incrementally “trained from examples” to do those tasks.
The basic idea is to supply lots of “input → output” examples to “learn from”—and then to try to find weights that will reproduce these examples.
To find out “how far away we are” we compute what’s usually called a “loss function” (or sometimes “cost function”).
But which of these is “right”? There’s really no way to say. They’re all “consistent with the observed data”
over the past decade, there’ve been many advances in the art of training neural nets. And, yes, it is basically an art.
there’s the matter of what architecture of neural net one should use for a particular task.
how one’s going to get the data on which to train the neural net.
if the net is too small, it just can’t reproduce the function we want
as soon as there’s even one intermediate layer it’s always in principle possible to approximate any function arbitrarily well
How much data do you need to show a neural net to train it for a particular task? Again, it’s hard to estimate from first principles.
How about something like ChatGPT? Well, it has the nice feature that it can do “unsupervised learning”, making it much easier to get it examples to train from.
In the end it’s all about determining what weights will best capture the training examples that have been given.
But it’s increasingly clear that having high-precision numbers doesn’t matter; 8 bits or less might be enough even with current methods.
neural net training as it’s now done is fundamentally sequential, with the effects of each batch of examples being propagated back to update the weights.
there’s an ultimate tradeoff between capability and trainability: the more you want a system to make “true use” of its computational capabilities, the more it’s going to show computational irreducibility, and the less it’s going to be trainable.
this takes us closer to “having a theory” of how we humans manage to do things like writing essays, or in general deal with language.
it’s a giant neural net—currently a version of the so-called GPT-3 network with 175 billion weights.
First, it takes the sequence of tokens that corresponds to the text so far, and finds an embedding (i.e. an array of numbers) that represents these.
ChatGPT was successfully trained on a few hundred billion words of text
we can expect there to be major new “laws of language”—and effectively “laws of thought”—out there to discover.
Start from a huge sample of human-created text from the web, books, etc. Then train a neural net to generate text that’s “like this”
But it’s amazing how human-like the results are
human language (and the patterns of thinking behind it) are somehow simpler and more “law like” in their structure than we thought. ChatGPT has implicitly discovered it.