Found 147 bookmarks
Custom sorting
The current state of gpt-5
The current state of gpt-5
The GPT-5 launch was uh, rough. A lot went wrong here, and I want to talk about what really happened...Thank you Kilo Code for sponsoring! Check them out at:...
·youtube.com·
The current state of gpt-5
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including: 🔵 Policy Gradient 🔵 Actor-Critic Models 🔵 The Value Function 🔵 The Generalized Advantage Estimate In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities. Papers: - PPO paper: https://arxiv.org/pdf/1707.06347 - GAE paper: https://arxiv.org/pdf/1506.02438 - TRPO paper: https://arxiv.org/pdf/1502.05477 Well-written blogposts: - https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/ - https://huggingface.co/blog/NormalUhr/rlhf-pipeline - https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/ Implementations: - (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2 - Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py - Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer Mother of all RL books (Barto & Sutton): http://incompleteideas.net/book/RLbook2020.pdf 00:00 Intro 01:21 RL for LLMs 05:53 Policy Gradient 09:23 The Value Function 12:14 Generalized Advantage Estimate 17:17 End-to-end Training Algorithm 18:23 Importance Sampling 20:02 PPO Clipping 21:36 Outro Special thanks to Anish Tondwalkar for discussing some of these concepts with me. Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.
·youtube.com·
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
Build Your Own Eval Tools With Notebooks!
Build Your Own Eval Tools With Notebooks!
35% off our evals course: https://bit.ly/evals-ai Vincent introduces Marimo, a reactive notebook environment. He walks us through the features of Marimo, including interactive and reactive charts and widget integration. Vincent demonstrates how you can use these components to build annotation apps for evals. Vincent also highlights differences between Marimo and traditional Jupyter Notebooks. Links: 1. Repo w/notebook: https://github.com/koaning/molabel 2. Vincen'ts drawing pad: https://www.amazon.com/Inspiroy-H640P-Graphics-Battery-Free-Sensitivity/dp/B075T6MTJX 3. Vincent's sites: https://koaning.io , and https://calmcode.io/ 00:00 Introduction to Data Science Journey 00:27 Exploring the Chick Weight Dataset 00:57 Interactive Data Analysis with Marimo 02:04 Importance of Looking at Data 03:32 Advanced Data Visualization Techniques 05:14 Introduction to Marimo's Unique Features 06:44 Reactive Programming in Marmo 12:50 AI Integration and Custom Rules 15:30 Marimo's Storage and Export Options 27:16 Advanced Visualization and Annotation 37:10 Introduction to Any Widget 37:45 Building Custom Widgets 38:56 Showcasing the Scatter Widget 40:29 Defining Widgets with Any Widget 45:58 Annotation Widgets and Their Uses 52:14 Exploring More Widget Capabilities 01:01:32 Marimo's App Mode and Deployment 01:03:37 Final Thoughts and Future Directions 01:04:45 Q&A and Closing Remarks
·youtube.com·
Build Your Own Eval Tools With Notebooks!
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Get started with 10Web and their AI Website Builder API: https://10web.io/website-builder-api/?utm_source=YouTube&utm_medium=Influencer&utm_campaign=TechWithTim Today, you'll learn how to fine-tune LLMs in Python for use in Ollama. I'll walk you through it step by step, give you all the code and show you how to test it out. DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim ⏳ Timestamps ⏳ 00:00 | What is Fine-Tuning? 02:25 | Gathering Data 05:52 | Google Collab Setup 09:17 | Fine-Tuning with Unsloth 16:58 | Model Setup in Ollama 🎞 Video Resources 🎞 Code in this video: https://drive.google.com/drive/folders/1p4ZilsJsdxB5lH6ZBMdIEJBt0WVUMsDq?usp=sharing Notebook Google Collab: https://colab.research.google.com/drive/1NsRGmHVupulRzsq9iUTx8V8WgTSpO_04?usp=sharing Hashtags #Python #Ollama #LLM
·youtube.com·
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
The LLM's RL Revelation We Didn't See Coming
The LLM's RL Revelation We Didn't See Coming
Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions) My Newsletter https://mail.bycloud.ai/ my project: find, discover & explain AI research semantically https://findmypapers.ai/ My Patreon (get bundle access for my newsletter & findmypapers) https://www.patreon.com/c/bycloud Training language models to follow instructions with human feedback [Paper] https://arxiv.org/abs/2203.02155 DeepSeek-R1 (Aha Moment) [Paper] https://arxiv.org/abs/2501.12948 Understanding R1-Zero-Like Training: A Critical Perspective [Paper] https://arxiv.org/pdf/2503.20783 Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? [Paper] https://arxiv.org/abs/2504.13837 Reinforcement Learning Finetunes Small Subnetworks in Large Language Models [Paper] https://arxiv.org/abs/2505.11711 Spurious Rewards: Rethinking Training Signals in RLVR [Paper] https://arxiv.org/abs/2506.10947 Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI This video is supported by the kind Patrons & YouTube Members: 🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa, Toru Mon [Discord] https://discord.gg/NhJZGtH [Twitter] https://twitter.com/bycloudai [Patreon] https://www.patreon.com/bycloud [Business Inquiries] bycloud@smoothmedia.co [Profile & Banner Art] https://twitter.com/pygm7 [Video Editor] @Booga04 [Ko-fi] https://ko-fi.com/bycloudai
·youtube.com·
The LLM's RL Revelation We Didn't See Coming
THIS is why large language models can understand the world
THIS is why large language models can understand the world
5 years ago, nobody would have guessed that scaling up LLMs would as successful as they are. This belief, in part, was due to the fact that all known statistical learning theory predicted that massively oversized models should overfit, and hence perform worse than smaller models. Yet the undeniable fact is that modern LLMs do possess models of the world that allow them to generalize beyond their training data. Why do larger models generalize better than smaller models? Why does training a model to predict internet text cause it to develop world models? Come deep dive into the inner working of neural network training to understand why scaling LLMs works so damn well. Want to see more videos like this in the future? Support me on Ko-fi https://ko-fi.com/algorithmicsimplicity Papers referenced: Double Descent: https://arxiv.org/abs/1812.11118 The Lottery Ticket Hypothesis: https://arxiv.org/abs/1803.03635 My previous videos on Autoregressive Transformers: Auto-regression (and diffusion): https://youtu.be/zc5NTeJbk-k Transformers: https://youtu.be/kWLed8o5M2Y
·youtube.com·
THIS is why large language models can understand the world