Found 219 bookmarks
Newest
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.

Long contexts represent a new front in the struggle to control LLMs. We explored a family of attacks that are newly feasible due to longer context lengths, as well as candidate mitigations. We found that the effectiveness of attacks, and of in-context learning more generally, could be characterized by simple power laws. This provides a richer source of feedback for mitigating long-context attacks than the standard approach of measuring frequency of success

·www-cdn.anthropic.com·
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.
·royalsocietypublishing.org·
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.