Found 46 bookmarks
Custom sorting
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.

Long contexts represent a new front in the struggle to control LLMs. We explored a family of attacks that are newly feasible due to longer context lengths, as well as candidate mitigations. We found that the effectiveness of attacks, and of in-context learning more generally, could be characterized by simple power laws. This provides a richer source of feedback for mitigating long-context attacks than the standard approach of measuring frequency of success

·www-cdn.anthropic.com·
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.