The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
Download PDF
A taxonomy and review of generalization research in NLP
Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning