Search cyberveille.decio.ch

Found 2 bookmarks

Custom sorting

Many-shot jailbreaking \ Anthropic

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

#anthropic #EN #2024 #AI #LLM #Jailbreak #Many-shot

·anthropic.com·Jan 8, 2025

Many-shot jailbreaking \ Anthropic

Anthropic researchers find that AI models can be trained to deceive

A study co-authored by researchers at Anthropic finds that AI models can be trained to deceive -- and that this deceptive behavior is difficult to combat.

#techcrunch #EN #2024 #AI #models #study #deceive #research #Anthropic

·techcrunch.com·Jan 15, 2024

Anthropic researchers find that AI models can be trained to deceive