Beyond Preferences in AI Alignment
Iterative Reasoning Preference Optimization
Humans prefer AI-generated copy, survey finds
Self-Rewarding Language Models
Download PDF
Diffusion Model Alignment Using Direct Preference Optimization
Download PDF
The Alignment Problem: Machine Learning and Human Values with Brian Christian
Google Bard readies ‘Memory’ to adapt to important details about you
The Jiminy Advisor: Moral Agreements among Stakeholders Based on Norms and Argumentation | Journal of Artificial Intelligence Research
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Emotion prediction as computation over a generative theory of mind | Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
PDF
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Researchers From Stanford And DeepMind Come Up With The Idea of Using Large Language Models LLMs as a Proxy Reward Function
Joscha Bach on Twitter