Attention in transformers, visually explained | Chapter 6, Deep Learning
Demystifying attention, the key mechanism inside transformers and LLMs.Instead of sponsored ad reads, these lessons are funded directly by viewers: https://3...
But what is a GPT? Visual intro to Transformers | Deep learning, chapter 5
An introduction to transformers and their prerequisitesEarly view of the next chapter for patrons: https://3b1b.co/early-attentionOther recommended resources...
A Simple Guide To Retrieval Augmented Generation Language Models — Smashing Magazine
Language models have shown impressive capabilities. But that doesn’t mean they’re without faults, as anyone who has witnessed a ChatGPT “hallucination” can attest. In this article, Joas Pambou diagnoses the symptoms that cause hallucinations and explains not only what RAG is but also different approaches for using it to solve language model limitations.
Code LoRA from Scratch - a Lightning Studio by sebastian
LoRA (Low-Rank Adaptation) is a popular technique to finetune LLMs more efficiently.
This Studio explains how LoRA works by coding it from scratch, which is an excellent exercise for looking under the hood of an algorithm.
AI/ML has been witnessing a rapid acceleration in model improvement in the last few years. The majority of the state-of-the-art models in the field are based on the Transformer architecture. Examples include models like BERT (which when applied to Google Search, resulted in what Google calls "one of the biggest leaps forward in the history of Search") and OpenAI's GPT2 and GPT3 (which are able to generate coherent text and essays).
This video by the author of the popular "Illustrated Transformer" guide will introduce the Transformer architecture and its various applications. This is a visual presentation accessible to people with various levels of ML experience.
Intro (0:00)
The Architecture of the Transformer (4:18)
Model Training (7:11)
Transformer LM Component 1: FFNN (10:01)
Transformer LM Component 2: Self-Attention(12:27)
Tokenization: Words to Token Ids (14:59)
Embedding: Breathe meaning into tokens (19:42)
Projecting the Output: Turning Computation into Language (24:11)
Final Note: Visualizing Probabilities (25:51)
The Illustrated Transformer:
https://jalammar.github.io/illustrated-transformer/
Simple transformer language model notebook:
https://github.com/jalammar/jalammar.github.io/blob/master/notebooks/Simple_Transformer_Language_Model.ipynb
Philosophers On GPT-3 (updated with replies by GPT-3):
https://dailynous.com/2020/07/30/philosophers-gpt-3/
-----
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
More videos by Jay:
Jay's Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4
How GPT-3 Works - Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ
This book offers a comprehensive introduction to the central ideas that underpin deep learning. It is intended both for newcomers to machine learning and for those already experienced in the field.
What you’ll learn in this course In ChatGPT Prompt Engineering for Developers, you will learn how to use a large language model (LLM) to quickly build new and powerful applications. Using the OpenAI API, you’ll...
Fixing LLM Hallucinations with Retrieval Augmentation in LangChain #6
Large Language Models (LLMs) have a data freshness problem. Even some of the most powerful models, like ChatGPT's gpt-3.5-turbo and GPT-4, have no idea about...
Practical Deep Learning for Coders - Practical Deep Learning
A free course designed for people with some coding experience, who want to learn how to apply deep learning and machine learning to practical problems.
Machine Learning with Python: from Linear Models to Deep Learning | edX
An in-depth introduction to the field of machine learning, from linear models to deep learning and reinforcement learning, through hands-on Python projects. -- Part of the MITx MicroMasters program in Statistics and Data Science.