Sleeper Agents: Training Deceptive LLMs that Persist Through Safety TrainingDownload PDF#Deception#Large Language Models#Paper#PDF·arxiv.org·Jan 13, 2024Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training
Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure#Deception#Large Language Models#Paper#PDF·arxiv.org·Nov 15, 2023Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure
Role-Play with Large Language Models#Large Language Models#Dialogue#Deception#Self-Awareness#Paper#PDF#DeepMind·arxiv.org·Nov 13, 2023Role-Play with Large Language Models