Iterative Reasoning Preference Optimization
Self-Rewarding Language Models
Download PDF
Diffusion Model Alignment Using Direct Preference Optimization
Download PDF
The Jiminy Advisor: Moral Agreements among Stakeholders Based on Norms and Argumentation | Journal of Artificial Intelligence Research
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Emotion prediction as computation over a generative theory of mind | Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
PDF
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation