Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models
Investigating Continual Pretraining in Large Language Models: Insights and Implications
Download PDF
On The Fairness Impacts of Hardware Selection in Machine Learning
Download PDF
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs | OpenReview
Elo Uncovered: Robustness and Best Practices in Language Model Evaluation
Which Prompts Make The Difference? Data Prioritization For Efficient Human LLM Evaluation
Download PDF
Goodtriever: Adaptive Toxicity Mitigation with Retrieval-augmented Models
Download PDF
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Intriguing Properties of Quantization at Scale
PDF