Search Test Information Space

Found 5 bookmarks

Custom sorting

Influence and cyber operations an update october 2024

·openai.com·Oct 9, 2024

AI deception: A survey of examples, risks, and potential solutions

·cell.com·May 10, 2024

Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training

Download PDF

·arxiv.org·Jan 13, 2024

Technical Report: Large Language Models can Strategically Deceive their Users when Put Under Pressure

·arxiv.org·Nov 15, 2023

Role-Play with Large Language Models

·arxiv.org·Nov 13, 2023