Found 661 bookmarks
Custom sorting
OpenEQA: From word models to world models
OpenEQA: From word models to world models
OpenEQA combines challenging open-vocabulary questions with the ability to answer in natural language. This results in a straightforward benchmark that demonstrates a strong understanding of the environment—and poses a considerable challenge to current foundational models. We hope this work motivates additional research into helping AI understand and communicate about the world it sees.
·ai.meta.com·
OpenEQA: From word models to world models
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.

Long contexts represent a new front in the struggle to control LLMs. We explored a family of attacks that are newly feasible due to longer context lengths, as well as candidate mitigations. We found that the effectiveness of attacks, and of in-context learning more generally, could be characterized by simple power laws. This provides a richer source of feedback for mitigating long-context attacks than the standard approach of measuring frequency of success

·www-cdn.anthropic.com·
Anil, C., Durmus, E., Sharma, M., Benton, J., Kundu, S., Batson, J., ... & Duvenaud, D. (2024). Many-shot Jailbreaking.
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.
·royalsocietypublishing.org·
Nay, J. J., Karamardian, D., Lawsky, S. B., Tao, W., Bhat, M., Jain, R., ... & Kasai, J. (2024). Large language models as tax attorneys: a case study in legal capabilities emergence. Philosophical Transactions of the Royal Society A, 382(2270), 20230159.