Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models#Large Language Models#Evaluation#Peer Review#Paper#PDF#Cohere·arxiv.org·Apr 30, 2024Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM EvaluationDownload PDF#Large Language Models#RLHF#Evaluation#Paper#PDF#Cohere·arxiv.org·Oct 27, 2023Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation