Found 6 bookmarks
Custom sorting
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails.
·unit42.paloaltonetworks.com·
Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability
Data Exfiltration from Slack AI via indirect prompt injection
Data Exfiltration from Slack AI via indirect prompt injection
This vulnerability can allow attackers to steal anything a user puts in a private Slack channel by manipulating the language model used for content generation. This was responsibly disclosed to Slack (more details in Responsible Disclosure section at the end).
·promptarmor.substack.com·
Data Exfiltration from Slack AI via indirect prompt injection
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
At Project Zero, we constantly seek to expand the scope and effectiveness of our vulnerability research. Though much of our work still relies on traditional methods like manual source code audits and reverse engineering, we're always looking for new approaches. As the code comprehension and general reasoning ability of Large Language Models (LLMs) has improved, we have been exploring how these models can reproduce the systematic approach of a human security researcher when identifying and demonstrating security vulnerabilities. We hope that in the future, this can close some of the blind spots of current automated vulnerability discovery approaches, and enable automated detection of "unfuzzable" vulnerabilities.
·googleprojectzero.blogspot.com·
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models