Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs#RLHF#Reinforcement Learning#Large Language Models#Paper#PDF·arxiv.org·Feb 26, 2024Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs