Olmo 3: Charting a path through the model flow to lead open-source AI | Ai2
Our new flagship Olmo 3 model family empowers the open source community with not only state-of-the-art open models, but the entire model flow and full traceability back to training data.
The GPT-5 launch was uh, rough. A lot went wrong here, and I want to talk about what really happened...Thank you Kilo Code for sponsoring! Check them out at:...
What's the strongest AI model you can train on a laptop in five minutes?
What’s the strongest model I can train on my MacBook Pro in five minutes? I’ll give the answer upfront: the best 5-minute model I could train was a ~1.8M-param…
OpenAI’s new open weight (Apache 2) models are really good
The long promised OpenAI open weight models are here, and they are very impressive. They’re available under proper open source licenses—Apache 2.0—and come in two sizes, 120B and 20B. OpenAI’s …
Proximal Policy Optimization (PPO) for LLMs Explained Intuitively
In this video, I break down Proximal Policy Optimization (PPO) from first principles, without assuming prior knowledge of Reinforcement Learning. By the end, you’ll understand the core RL building blocks that led to PPO, including:
🔵 Policy Gradient
🔵 Actor-Critic Models
🔵 The Value Function
🔵 The Generalized Advantage Estimate
In the LLM world, PPO was used to train reasoning models like OpenAI's o1/o3, and presumably Claude 3.7, Grok 3, etc. It’s the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human preferences and Reinforcement Learning with Verifiable Rewards (RLVR), which gives LLMs reasoning abilities.
Papers:
- PPO paper: https://arxiv.org/pdf/1707.06347
- GAE paper: https://arxiv.org/pdf/1506.02438
- TRPO paper: https://arxiv.org/pdf/1502.05477
Well-written blogposts:
- https://danieltakeshi.github.io/2017/04/02/notes-on-the-generalized-advantage-estimation-paper/
- https://huggingface.co/blog/NormalUhr/rlhf-pipeline
- https://iclr-blog-track.github.io/2022/03/25/ppo-implementation-details/
Implementations:
- (Original) OpenAI Baseslines: https://github.com/openai/baselines/blob/ea25b9e8b234e6ee1bca43083f8f3cf974143998/baselines/ppo2
- Hugging Face: https://github.com/huggingface/trl/blob/main/trl/trainer/ppo_trainer.py
- Hugging Face docs: https://huggingface.co/docs/trl/main/en/ppo_trainer
Mother of all RL books (Barto & Sutton):
http://incompleteideas.net/book/RLbook2020.pdf
00:00 Intro
01:21 RL for LLMs
05:53 Policy Gradient
09:23 The Value Function
12:14 Generalized Advantage Estimate
17:17 End-to-end Training Algorithm
18:23 Importance Sampling
20:02 PPO Clipping
21:36 Outro
Special thanks to Anish Tondwalkar for discussing some of these concepts with me.
Note: At 21:10, A_t should have been inside the min. Thanks @t.w.7065 for catching this.
35% off our evals course: https://bit.ly/evals-ai
Vincent introduces Marimo, a reactive notebook environment. He walks us through the features of Marimo, including interactive and reactive charts and widget integration. Vincent demonstrates how you can use these components to build annotation apps for evals. Vincent also highlights differences between Marimo and traditional Jupyter Notebooks.
Links:
1. Repo w/notebook: https://github.com/koaning/molabel
2. Vincen'ts drawing pad: https://www.amazon.com/Inspiroy-H640P-Graphics-Battery-Free-Sensitivity/dp/B075T6MTJX
3. Vincent's sites: https://koaning.io , and https://calmcode.io/
00:00 Introduction to Data Science Journey
00:27 Exploring the Chick Weight Dataset
00:57 Interactive Data Analysis with Marimo
02:04 Importance of Looking at Data
03:32 Advanced Data Visualization Techniques
05:14 Introduction to Marimo's Unique Features
06:44 Reactive Programming in Marmo
12:50 AI Integration and Custom Rules
15:30 Marimo's Storage and Export Options
27:16 Advanced Visualization and Annotation
37:10 Introduction to Any Widget
37:45 Building Custom Widgets
38:56 Showcasing the Scatter Widget
40:29 Defining Widgets with Any Widget
45:58 Annotation Widgets and Their Uses
52:14 Exploring More Widget Capabilities
01:01:32 Marimo's App Mode and Deployment
01:03:37 Final Thoughts and Future Directions
01:04:45 Q&A and Closing Remarks
EASIEST Way to Fine-Tune a LLM and Use It With Ollama
Get started with 10Web and their AI Website Builder API: https://10web.io/website-builder-api/?utm_source=YouTube&utm_medium=Influencer&utm_campaign=TechWithTim
Today, you'll learn how to fine-tune LLMs in Python for use in Ollama. I'll walk you through it step by step, give you all the code and show you how to test it out.
DevLaunch is my mentorship program where I personally help developers go beyond tutorials, build real-world projects, and actually land jobs. No fluff. Just real accountability, proven strategies, and hands-on guidance. Learn more here - https://training.devlaunch.us/tim
⏳ Timestamps ⏳
00:00 | What is Fine-Tuning?
02:25 | Gathering Data
05:52 | Google Collab Setup
09:17 | Fine-Tuning with Unsloth
16:58 | Model Setup in Ollama
🎞 Video Resources 🎞
Code in this video: https://drive.google.com/drive/folders/1p4ZilsJsdxB5lH6ZBMdIEJBt0WVUMsDq?usp=sharing
Notebook Google Collab: https://colab.research.google.com/drive/1NsRGmHVupulRzsq9iUTx8V8WgTSpO_04?usp=sharing
Hashtags
#Python #Ollama #LLM
Try out Warp 2.0 now, the current rank #1 AI on Terminal Bench, outperforming Claude Code: https://go.warp.dev/bycloud
You can also use code "BYCLOUD" to get Warp Pro for 1 month free. (limited for 1,000 redemptions)
My Newsletter
https://mail.bycloud.ai/
my project: find, discover & explain AI research semantically
https://findmypapers.ai/
My Patreon (get bundle access for my newsletter & findmypapers)
https://www.patreon.com/c/bycloud
Training language models to follow instructions with human feedback
[Paper] https://arxiv.org/abs/2203.02155
DeepSeek-R1 (Aha Moment)
[Paper] https://arxiv.org/abs/2501.12948
Understanding R1-Zero-Like Training: A Critical Perspective
[Paper] https://arxiv.org/pdf/2503.20783
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
[Paper] https://arxiv.org/abs/2504.13837
Reinforcement Learning Finetunes Small Subnetworks in Large Language Models
[Paper] https://arxiv.org/abs/2505.11711
Spurious Rewards: Rethinking Training Signals in RLVR
[Paper] https://arxiv.org/abs/2506.10947
Try out my new fav place to learn how to code https://scrimba.com/?via=bycloudAI
This video is supported by the kind Patrons & YouTube Members:
🙏Nous Research, Chris LeDoux, Ben Shaener, DX Research Group, Poof N' Inu, Andrew Lescelius, Deagan, Robert Zawiasa, Ryszard Warzocha, Tobe2d, Louis Muk, Akkusativ, Kevin Tai, Mark Buckler, NO U, Tony Jimenez, Ângelo Fonseca, jiye, Anushka, Asad Dhamani, Binnie Yiu, Calvin Yan, Clayton Ford, Diego Silva, Etrotta, Gonzalo Fidalgo, Handenon, Hector, Jake Disco very, Michael Brenner, Nilly K, OlegWock, Daddy Wen, Shuhong Chen, Sid_Cipher, Stefan Lorenz, Sup, tantan assawade, Thipok Tham, Thomas Di Martino, Thomas Lin, Richárd Nagyfi, Paperboy, mika, Leo, Berhane-Meskel, Kadhai Pesalam, mayssam, Bill Mangrum, nyaa,
Toru Mon
[Discord] https://discord.gg/NhJZGtH
[Twitter] https://twitter.com/bycloudai
[Patreon] https://www.patreon.com/bycloud
[Business Inquiries] bycloud@smoothmedia.co
[Profile & Banner Art] https://twitter.com/pygm7
[Video Editor] @Booga04
[Ko-fi] https://ko-fi.com/bycloudai
How to Fine Tune your own LLM using LoRA (on a CUSTOM dataset!)
That gameboy blender animation...took 6 hours to render 😅. Anyway, had a ton of fun coding this up and finally getting back to some proper ML. I've been thi...
THIS is why large language models can understand the world
5 years ago, nobody would have guessed that scaling up LLMs would as successful as they are. This belief, in part, was due to the fact that all known statistical learning theory predicted that massively oversized models should overfit, and hence perform worse than smaller models. Yet the undeniable fact is that modern LLMs do possess models of the world that allow them to generalize beyond their training data.
Why do larger models generalize better than smaller models? Why does training a model to predict internet text cause it to develop world models? Come deep dive into the inner working of neural network training to understand why scaling LLMs works so damn well.
Want to see more videos like this in the future? Support me on Ko-fi https://ko-fi.com/algorithmicsimplicity
Papers referenced:
Double Descent: https://arxiv.org/abs/1812.11118
The Lottery Ticket Hypothesis: https://arxiv.org/abs/1803.03635
My previous videos on Autoregressive Transformers:
Auto-regression (and diffusion): https://youtu.be/zc5NTeJbk-k
Transformers: https://youtu.be/kWLed8o5M2Y
Anthropic publish most of the system prompts for their chat models as part of their release notes. They recently shared the new prompts for both Claude Opus 4 and Claude …
Reinforcement Learning with Neural Networks: Essential Concepts
Reinforcement Learning has helped train neural networks to win games, drive cars and even get ChatGPT to sound more human when it responds to your prompt. Th...