Search Test Information Space

Found 4 bookmarks

Newest

Learning to Plan & Reason for Evaluation with Thinking-LLM-as-a-Judge

View PDF

·arxiv.org·Feb 1, 2025

Prometheus 2: An Open Source Language Model Specialized in Evaluating Other Language Models

View PDF

·arxiv.org·May 3, 2024

Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models

·arxiv.org·Apr 30, 2024

Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation

Download PDF

·arxiv.org·Oct 27, 2023