RLHF Workflow: From Reward Modeling to Online RLHF
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
Download PDF
Human Feedback is not Gold Standard
Download PDF