Simple probes can catch sleeper agents \ Anthropic#Training#Large Language Models#Anthropic#Paper#Classification#Cybersecurity·anthropic.com·Apr 24, 2024Simple probes can catch sleeper agents \ Anthropic