Found 13 bookmarks
Custom sorting
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails.
·unit42.paloaltonetworks.com·
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
Data Exfiltration from Slack AI via indirect prompt injection
Data Exfiltration from Slack AI via indirect prompt injection
This vulnerability can allow attackers to steal anything a user puts in a private Slack channel by manipulating the language model used for content generation. This was responsibly disclosed to Slack (more details in Responsible Disclosure section at the end).
·promptarmor.substack.com·
Data Exfiltration from Slack AI via indirect prompt injection
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
At Project Zero, we constantly seek to expand the scope and effectiveness of our vulnerability research. Though much of our work still relies on traditional methods like manual source code audits and reverse engineering, we're always looking for new approaches. As the code comprehension and general reasoning ability of Large Language Models (LLMs) has improved, we have been exploring how these models can reproduce the systematic approach of a human security researcher when identifying and demonstrating security vulnerabilities. We hope that in the future, this can close some of the blind spots of current automated vulnerability discovery approaches, and enable automated detection of "unfuzzable" vulnerabilities.
·googleprojectzero.blogspot.com·
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
Personal Information Exploit on OpenAI’s ChatGPT Raise Privacy Concerns
Personal Information Exploit on OpenAI’s ChatGPT Raise Privacy Concerns
Last month, I received an alarming email from someone I did not know: Rui Zhu, a Ph.D. candidate at Indiana University Bloomington. Mr. Zhu had my email address, he explained, because GPT-3.5 Turbo, one of the latest and most robust large language models (L.L.M.) from OpenAI, had delivered it to him.
·nytimes.com·
Personal Information Exploit on OpenAI’s ChatGPT Raise Privacy Concerns
Large Language Models and Elections
Large Language Models and Elections
Earlier this week, the Republican National Committee released a video that it claims was “built entirely with AI imagery.” The content of the ad isn’t especially novel—a dystopian vision of America under a second term with President Joe Biden—but the deliberate emphasis on the technology used to create it stands out: It’s a “Daisy” moment for the 2020s.
·schneier.com·
Large Language Models and Elections