Iterative Reasoning Preference Optimization#Reasoning#Preferences#Paper#PDF#Meta#Large Language Models#Algorithms#Chain of Thought·arxiv.org·May 1, 2024Iterative Reasoning Preference Optimization
Self-Rewarding Language ModelsDownload PDF#AI#Meta#Paper#PDF#Large Language Models#Preferences#Autonomous·arxiv.org·Jan 20, 2024Self-Rewarding Language Models
Diffusion Model Alignment Using Direct Preference OptimizationDownload PDF#Fine-Tuning#Stable Diffusion#Preferences#Large Language Models#Paper#PDF·arxiv.org·Nov 24, 2023Diffusion Model Alignment Using Direct Preference Optimization
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelPDF#Large Language Models#Preferences#Reward#Training#Paper#PDF·arxiv.org·Jun 4, 2023Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function#Preferences#Proxy#Large Language Models·marktechpost.com·Mar 9, 2023Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function