Found 7 bookmarks
Newest
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including: 🔵 Policy Gradient 🔵 Actor-Critic Models 🔵 The Value Function 🔵 The Generalized Advantage Estimate In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities. Papers: - PPO paper: https://arxiv.org/pdf/1707.06347 - GAE paper: https://arxiv.org/pdf/1506.02438 - TRPO paper: https://arxiv.org/pdf/1502.05477 Well-written blogposts: - https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/ - https://huggingface.co/blog/NormalUhr/rlhf-pipeline - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Implementations: - (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2 - Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py - Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLbook2020.pdf 00:00 Intro 01:21 RL for LLMs 05:53 Policy Gradient 09:23 The Value Function 12:14 Generalized Advantage Estimate 17:17 End-to-end Training Algorithm 18:23 Importance Sampling 20:02 PPO Clipping 21:36 Outro Special thanks to Anish Tondwalkar for discussing some of these concepts with me. Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.
·youtube.com·
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Get started with 10Web and their AI Website Builder API: https://10web.io/website-builder-api/?utm_source=YouTube&utm_medium=Influencer&utm_campaign=TechWithTim Today, you'll learn how to fine-tune LLMs in Python for use in Ollama. I'll walk you through it step by step, give you all the code and show you how to test it out. DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim ⏳ Timestamps ⏳ 00:00 | What is Fine-Tuning? 02:25 | Gathering Data 05:52 | Google Collab Setup 09:17 | Fine-Tuning with Unsloth 16:58 | Model Setup in Ollama 🎞 Video Resources 🎞 Code in this video: https://drive.google.com/drive/folders/1p4ZilsJsdxB5lH6ZBMdIEJBt0WVUMsDq?usp=sharing Notebook Google Collab: https://colab.research.google.com/drive/1NsRGmHVupulRzsq9iUTx8V8WgTSpO_04?usp=sharing Hashtags #Python #Ollama #LLM
·youtube.com·
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
The Most Important Algorithm in Machine Learning
The Most Important Algorithm in Machine Learning
Shortform link: https://shortform.com/artem In this video we will talk about backpropagation – an algorithm powering the entire field of machine learning and try to derive it from first principles. OUTLINE: 00:00 Introduction 01:28 Historical background 02:50 Curve Fitting problem 06:26 Random vs guided adjustments 09:43 Derivatives 14:34 Gradient Descent 16:23 Higher dimensions 21:36 Chain Rule Intuition 27:01 Computational Graph and Autodiff 36:24 Summary 38:16 Shortform 39:20 Outro USEFUL RESOURCES: Andrej Karpathy's playlist: https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&si=zBUZW5kufVPLVy9E Jürgen Schmidhuber's blog on the history of backprop: https://people.idsia.ch/~juergen/who-invented-backpropagation.html CREDITS: Icons by https://www.freepik.com/
·youtube.com·
The Most Important Algorithm in Machine Learning