The Surprising Effectiveness of Test-Time Training for Abstract Reasoning#Large Language Models#Training#Testing#Paper#PDF·arxiv.org·Dec 9, 2024The Surprising Effectiveness of Test-Time Training for Abstract Reasoning
MART: Improving LLM Safety with Multi-round Automatic Red-TeamingDownload PDF#Testing#Large Language Models#Paper#PDF·arxiv.org·Nov 17, 2023MART: Improving LLM Safety with Multi-round Automatic Red-Teaming
A taxonomy and review of generalization research in NLP#Natural Language Processing#Generalization#Research#Testing#Paper#PDF·nature.com·Oct 19, 2023A taxonomy and review of generalization research in NLP
Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning#AI#Performance#Testing#Paper#PDF#Comprehension·arxiv.org·Feb 28, 2023Testing AI performance on less frequent aspects of language reveals insensitivity to underlying meaning