Constitutional Classifiers: Defending against Universal Jailbreaks...#Anthropic#Classification#Safety#Large Language Models#Paper#PDF·arxiv.org·Feb 3, 2025Constitutional Classifiers: Defending against Universal Jailbreaks...
Core Views on AI Safety: When, Why, What, and How#Anthropic#AI#Safety·anthropic.com·Mar 9, 2023Core Views on AI Safety: When, Why, What, and How