Search AI/ML

Found 7 bookmarks

Newest

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including: 🔵 Policy Gradient 🔵 Actor-Critic Models 🔵 The Value Function 🔵 The Generalized Advantage Estimate In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities. Papers: - PPO paper: https://arxiv.org/pdf/1707.06347 - GAE paper: https://arxiv.org/pdf/1506.02438 - TRPO paper: https://arxiv.org/pdf/1502.05477 Well-written blogposts: - https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/ - https://huggingface.co/blog/NormalUhr/rlhf-pipeline - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Implementations: - (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2 - Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py - Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLbook2020.pdf 00:00 Intro 01:21 RL for LLMs 05:53 Policy Gradient 09:23 The Value Function 12:14 Generalized Advantage Estimate 17:17 End-to-end Training Algorithm 18:23 Importance Sampling 20:02 PPO Clipping 21:36 Outro Special thanks to Anish Tondwalkar for discussing some of these concepts with me. Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.

#model training #tutorial

·youtube.com·Jul 7, 2025

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

Get started with 10Web and their AI Website Builder API: https://10web.io/website-builder-api/?utm_source=YouTube&utm_medium=Influencer&utm_campaign=TechWithTim Today, you'll learn how to fine-tune LLMs in Python for use in Ollama. I'll walk you through it step by step, give you all the code and show you how to test it out. DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim ⏳ Timestamps ⏳ 00:00 | What is Fine-Tuning? 02:25 | Gathering Data 05:52 | Google Collab Setup 09:17 | Fine-Tuning with Unsloth 16:58 | Model Setup in Ollama 🎞 Video Resources 🎞 Code in this video: https://drive.google.com/drive/folders/1p4ZilsJsdxB5lH6ZBMdIEJBt0WVUMsDq?usp=sharing Notebook Google Collab: https://colab.research.google.com/drive/1NsRGmHVupulRzsq9iUTx8V8WgTSpO_04?usp=sharing Hashtags #Python #Ollama #LLM

#fine tuning #model training #tutorial

·youtube.com·Jun 27, 2025

EASIEST Way to Fine-Tune a LLM and Use It With Ollama

How to Fine Tune your own LLM using LoRA (on a CUSTOM dataset!)

That gameboy blender animation...took 6 hours to render 😅. Anyway, had a ton of fun coding this up and finally getting back to some proper ML. I've been thi...

#model training #fine tuning #tutorial

·youtube.com·Jun 9, 2025

How to Fine Tune your own LLM using LoRA (on a CUSTOM dataset!)

Handwritten Text Recognition using OCR

In this article, we carry out handwritten text recognition using OCR. We fine tune the TrOCR model on the GNHK dataset.

#OCR #vision #tutorial #model training #handwriting

·learnopencv.com·Jun 1, 2025

Handwritten Text Recognition using OCR

Reinforcement Learning with Neural Networks: Essential Concepts

Reinforcement Learning has helped train neural networks to win games, drive cars and even get ChatGPT to sound more human when it responds to your prompt. Th...

#learn #model training #tutorial

·youtube.com·Apr 9, 2025

Reinforcement Learning with Neural Networks: Essential Concepts

The Most Important Algorithm in Machine Learning

Shortform link: https://shortform.com/artem In this video we will talk about backpropagation – an algorithm powering the entire field of machine learning and try to derive it from first principles. OUTLINE: 00:00 Introduction 01:28 Historical background 02:50 Curve Fitting problem 06:26 Random vs guided adjustments 09:43 Derivatives 14:34 Gradient Descent 16:23 Higher dimensions 21:36 Chain Rule Intuition 27:01 Computational Graph and Autodiff 36:24 Summary 38:16 Shortform 39:20 Outro USEFUL RESOURCES: Andrej Karpathy's playlist: https://youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ&si=zBUZW5kufVPLVy9E Jürgen Schmidhuber's blog on the history of backprop: https://people.idsia.ch/~juergen/who-invented-backpropagation.html CREDITS: Icons by https://www.freepik.com/

#learn #model training #math #tutorial

·youtube.com·Mar 22, 2025

The Most Important Algorithm in Machine Learning

Llama from scratch (or how to implement a paper without crying) | Brian Kitano

#model training #local model #code #learn #programming #tutorial

·blog.briankitano.com·Aug 10, 2023

Llama from scratch (or how to implement a paper without crying) | Brian Kitano