Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained#Fine-Tuning#Large Language Models#Feedback·youtube.com·Dec 26, 2023Direct Preference Optimization: Your Language Model is Secretly a Reward Model | DPO paper explained
The Flan Collection: Advancing open source methods for instruction tuning#Instruction#Machine Learning#Fine-Tuning#Prompt Engineering#Research#Google#Opensource#Feedback·ai.googleblog.com·Feb 2, 2023The Flan Collection: Advancing open source methods for instruction tuning