Iterative Reasoning Preference Optimization#Reasoning#Preferences#Paper#PDF#Meta#Large Language Models#Algorithms#Chain of Thought·arxiv.org·May 1, 2024Iterative Reasoning Preference Optimization
Self-Rewarding Language ModelsDownload PDF#AI#Meta#Paper#PDF#Large Language Models#Preferences#Autonomous·arxiv.org·Jan 20, 2024Self-Rewarding Language Models