Not sure what book to read next? Let artificial intelligence find the next book for you to read. Type in the titles of books you like and see what book recommendations AI will find for you out of more than a million titles. Enjoy reading!
Meet Claude: Anthropic’s Rival to ChatGPT | Blog | Scale AI
ChatGPT has captured LLM headlines and amazed the AI community with its extensive natural language processing capabilities. A new LLM from Anthropic called Claude is competitive with ChatGPT and offers great promise. We evaluate both models head to head and give you our thoughts on how they compare.
Let's build GPT: from scratch, in code, spelled out.
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 / GPT-3. We talk about connections to ChatGPT, which has taken the world by storm. We watch GitHub Copilot, itself a GPT, help us write a GPT (meta :D!) . I recommend people watch the earlier makemore videos to get comfortable with the autoregressive language modeling framework and basics of tensors and PyTorch nn, which we take for granted in this video.
Links:
- Google colab for the video: https://colab.research.google.com/drive/1JMLa53HDuA-i7ZBmqV7ZnA3c_fvtXnx-?usp=sharing
- GitHub repo for the video: https://github.com/karpathy/ng-video-lecture
- nanoGPT repo: https://github.com/karpathy/nanoGPT
- my website: https://karpathy.ai
- my twitter: https://twitter.com/karpathy
- our Discord channel: https://discord.gg/3zy8kqD9Cp
Supplementary links:
- Attention is All You Need paper: https://arxiv.org/abs/1706.03762
- OpenAI GPT-3 paper: https://arxiv.org/abs/2005.14165
- OpenAI ChatGPT blog post: https://openai.com/blog/chatgpt/
- The GPU I'm training the model on is from Lambda GPU Cloud, I think the best and easiest way to spin up an on-demand GPU instance in the cloud that you can ssh to: https://lambdalabs.com . If you prefer to work in notebooks, I think the easiest path today is Google Colab.
Suggested exercises:
- EX1: The n-dimensional tensor mastery challenge: Combine the `Head` and `MultiHeadAttention` into one class that processes all the heads in parallel, treating the heads as another batch dimension (answer is in nanoGPT).
- EX2: Train the GPT on your own dataset of choice! What other data could be fun to blabber on about? (A fun suggestion if you like: train on all the possible 3-digit addition problems and predict the sum in the reverse order. Does your Transformer learn the correct addition algorithm? Does it correctly generalize to the validation set?).
- EX3: Find a dataset that is very large, so large that you can't see a gap between train and val loss. Pretrain the transformer on this data, then initialize with that model and finetune it on tiny shakespeare with a smaller number of steps and lower learning rate. Can you obtain a lower validation loss by the use of pretraining?
- EX4: Read some transformer papers and implement one additional feature or change that people seem to use. Does it improve the performance of your GPT?
Chapters:
00:00:00 intro: ChatGPT, Transformers, nanoGPT, Shakespeare
baseline language modeling, code setup
00:07:52 reading and exploring the data
00:09:28 tokenization, train/val split
00:14:27 data loader: batches of chunks of data
00:22:11 simplest baseline: bigram language model, loss, generation
00:34:53 training the bigram model
00:38:00 port our code to a script
Building the "self-attention"
00:42:13 version 1: averaging past context with for loops, the weakest form of aggregation
00:47:11 the trick in self-attention: matrix multiply as weighted aggregation
00:51:54 version 2: using matrix multiply
00:54:42 version 3: adding softmax
00:58:26 minor code cleanup
01:00:18 positional encoding
01:02:00 THE CRUX OF THE VIDEO: version 4: self-attention
01:11:38 note 1: attention as communication
01:12:46 note 2: attention has no notion of space, operates over sets
01:13:40 note 3: there is no communication across batch dimension
01:14:14 note 4: encoder blocks vs. decoder blocks
01:15:39 note 5: attention vs. self-attention vs. cross-attention
01:16:56 note 6: "scaled" self-attention. why divide by sqrt(head_size)
Building the Transformer
01:19:11 inserting a single self-attention block to our network
01:21:59 multi-headed self-attention
01:24:25 feedforward layers of transformer block
01:26:48 residual connections
01:32:51 layernorm (and its relationship to our previous batchnorm)
01:37:49 scaling up the model! creating a few variables. adding dropout
Notes on Transformer
01:42:39 encoder vs. decoder vs. both (?) Transformers
01:46:22 super quick walkthrough of nanoGPT, batched multi-headed self-attention
01:48:53 back to ChatGPT, GPT-3, pretraining vs. finetuning, RLHF
01:54:32 conclusions
Corrections:
00:57:00 Oops "tokens from the _future_ cannot communicate", not "past". Sorry! :)
First look - ChatGPT + WolframAlpha (GPT-3.5 and Wolfram|Alpha via LangChain by James Weaver)
Try it here: https://huggingface.co/spaces/JavaFXpert/Chat-GPT-LangChain Your API key from here: https://beta.openai.com/account/api-keys Wolfram Alpha: http...
Getty Images is suing the creators of AI art tool Stable Diffusion for scraping its content
Getty Images claims Stability AI ‘unlawfully’ scraped millions of images from its site. It’s a significant escalation in the developing legal battles between generative AI firms and content creators.
Tutorial: DIY ChatGPT with Long Term Memories (external integration coming soon)
This repo: https://github.com/daveshap/LongtermChatExternalSources
Patreon: https://www.patreon.com/daveshap?fan_landing=true
GitHub: https://github.com/daveshap
Cognitive AI Lab Discord: https://discord.gg/yWYPwSFPjE
LinkedIn: https://www.linkedin.com/in/dshap-automator/
Twitter: https://twitter.com/dshap_automator
Mailing List: https://forms.gle/Sj4jYUb3quHLap1q9
00:00 - Introduction, Patreon, Comments, Etc
01:55 - Saving User Input
04:15 - Main Loop
07:14 - Side Tangent on Memories
09:20 - Compose the Corpus
14:12 - GPT-3 Prompts Used
15:18 - Response Handling
16:05 - Testing Output
19:00 - Teeing up Future Work
20:30 - Cognitive Architecture
21:30 - Outro (get in touch, comments, etc)
All opinions expressed are my own. My content is not: legal advice, medical advice, or financial advice.
arnabm14/Dev_AIChatbot_NLP: A basic tutorial on how to create a smart chatbot using AI and NLP
A basic tutorial on how to create a smart chatbot using AI and NLP - arnabm14/Dev_AIChatbot_NLP: A basic tutorial on how to create a smart chatbot using AI and NLP