AI/ML

AI/ML

2201 bookmarks
Custom sorting
Let's build GPT: from scratch, in code, spelled out.
Let's build GPT: from scratch, in code, spelled out.
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video. Links: - Google colab for the video: https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing - GitHub repo for the video: https://github.com/karpathy/ng-video-lecture - nanoGPT repo: https://github.com/karpathy/nanoGPT - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp Supplementary links: - Attention is All You Need paper: https://arxiv.org/abs/1706.03762 - OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165 - OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/ - The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab. Suggested exercises: - EX1: The n-dimensional tensor mastery challenge: Combine the `Head` and `MultiHeadAttention` into one class that processes all the heads in parallel, treating the heads as another batch dimension (answer is in nanoGPT). - EX2: Train the GPT on your own dataset of choice! What other data could be fun to blabber on about? (A fun suggestion if you like: train on all the possible 3-digit addition problems and predict the sum in the reverse order. Does your Transformer learn the correct addition algorithm? Does it correctly generalize to the validation set?). - EX3: Find a dataset that is very large, so large that you can't see a gap between train and val loss. Pretrain the transformer on this data, then initialize with that model and finetune it on tiny shakespeare with a smaller number of steps and lower learning rate. Can you obtain a lower validation loss by the use of pretraining? - EX4: Read some transformer papers and implement one additional feature or change that people seem to use. Does it improve the performance of your GPT? Chapters: 00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespeare baseline language modeling, code setup 00:07:52 reading and exploring the data 00:09:28 tokenization, train/val split 00:14:27 data loader: batches of chunks of data 00:22:11 simplest baseline: bigram language model, loss, generation 00:34:53 training the bigram model 00:38:00 port our code to a script Building the "self-attention" 00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation 00:47:11 the trick in self-attention: matrix multiply as weighted aggregation 00:51:54 version 2: using matrix multiply 00:54:42 version 3: adding softmax 00:58:26 minor code cleanup 01:00:18 positional encoding 01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention 01:11:38 note 1: attention as communication 01:12:46 note 2: attention has no notion of space, operates over sets 01:13:40 note 3: there is no communication across batch dimension 01:14:14 note 4: encoder blocks vs. decoder blocks 01:15:39 note 5: attention vs. self-attention vs. cross-attention 01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size) Building the Transformer 01:19:11 inserting a single self-attention block to our network 01:21:59 multi-headed self-attention 01:24:25 feedforward layers of transformer block 01:26:48 residual connections 01:32:51 layernorm (and its relationship to our previous batchnorm) 01:37:49 scaling up the model! creating a few variables. adding dropout Notes on Transformer 01:42:39 encoder vs. decoder vs. both (?) Transformers 01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention 01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF 01:54:32 conclusions Corrections: 00:57:00 Oops "tokens from the _future_ cannot communicate", not "past". Sorry! :)
·youtube.com·
Let's build GPT: from scratch, in code, spelled out.
Tutorial: DIY ChatGPT with Long Term Memories (external integration coming soon)
Tutorial: DIY ChatGPT with Long Term Memories (external integration coming soon)
This repo: https://github.com/daveshap/LongtermChatExternalSources Patreon: https://www.patreon.com/daveshap?fan_landing=true GitHub: https://github.com/daveshap Cognitive AI Lab Discord: https://discord.gg/yWYPwSFPjE LinkedIn: https://www.linkedin.com/in/dshap-automator/ Twitter: https://twitter.com/dshap_automator Mailing List: https://forms.gle/Sj4jYUb3quHLap1q9 00:00 - Introduction, Patreon, Comments, Etc 01:55 - Saving User Input 04:15 - Main Loop 07:14 - Side Tangent on Memories 09:20 - Compose the Corpus 14:12 - GPT-3 Prompts Used 15:18 - Response Handling 16:05 - Testing Output 19:00 - Teeing up Future Work 20:30 - Cognitive Architecture 21:30 - Outro (get in touch, comments, etc) All opinions expressed are my own. My content is not: legal advice, medical advice, or financial advice.
·youtube.com·
Tutorial: DIY ChatGPT with Long Term Memories (external integration coming soon)
When M.D. is a Machine Doctor
When M.D. is a Machine Doctor
Helping medical doctors and patients in the Foundation Model A.I. era
·erictopol.substack.com·
When M.D. is a Machine Doctor
Opinion | This Film Does Not Exist
Opinion | This Film Does Not Exist
How artificial intelligence can reimagine art from our past and influence our future.
·nytimes.com·
Opinion | This Film Does Not Exist
A Writer Used AI To Plagiarize Me. Now What?
A Writer Used AI To Plagiarize Me. Now What?
Anyone can use AI to copy, remix, and publish stolen work. The platforms have no good answer for what happens next.
·bigtechnology.com·
A Writer Used AI To Plagiarize Me. Now What?
Setting up Stable Diffusion for MacOS
Setting up Stable Diffusion for MacOS
With the landscape quickly changing, this article is fast becoming outdated! If you face issues...
·dev.to·
Setting up Stable Diffusion for MacOS
How to create a Offline service Chatbot?
How to create a Offline service Chatbot?
I want to create a Offline chatbot for my personal purpose at home. So I don't want to use API.AI or WIT.AI or any other Online API's which support my purpose. Is there any way I can create a chat...
·stackoverflow.com·
How to create a Offline service Chatbot?
Columnist uses AI to write review about AI | Boing Boing
Columnist uses AI to write review about AI | Boing Boing
In his column on the future of the comics business ICV2 contributor Rob Salkowitz included a section about the impact of AI-generated art and writing. This caught my attention: “In fact, the AI is …
·boingboing.net·
Columnist uses AI to write review about AI | Boing Boing
Dense Vectors | Pinecone
Dense Vectors | Pinecone
An overview of dense vector embeddings with NLP.
·pinecone.io·
Dense Vectors | Pinecone