Search cyberveille.decio.ch

Found 6 bookmarks

Custom sorting

Many-shot jailbreaking \ Anthropic

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

#anthropic #EN #2024 #AI #LLM #Jailbreak #Many-shot

·anthropic.com·Jan 8, 2025

Many-shot jailbreaking \ Anthropic

Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability

The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails. The jailbreak technique "Bad Likert Judge" manipulates LLMs to generate harmful content using Likert scales, exposing safety gaps in LLM guardrails.

#unit42 #EN #2024 #LLM #Jailbreak #Likert

·unit42.paloaltonetworks.com·Jan 8, 2025

Bad Likert Judge: A Novel Multi-Turn Technique to Jailbreak LLMs by Misusing Their Evaluation Capability

Data Exfiltration from Slack AI via indirect prompt injection

This vulnerability can allow attackers to steal anything a user puts in a private Slack channel by manipulating the language model used for content generation. This was responsibly disclosed to Slack (more details in Responsible Disclosure section at the end).

#promptarmor #EN #2024 #Slack #prompt-injection #LLM #vulnerability #steal #indirect-prompt #injection

·promptarmor.substack.com·Aug 20, 2024

Data Exfiltration from Slack AI via indirect prompt injection

Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models

At Project Zero, we constantly seek to expand the scope and effectiveness of our vulnerability research. Though much of our work still relies on traditional methods like manual source code audits and reverse engineering, we're always looking for new approaches. As the code comprehension and general reasoning ability of Large Language Models (LLMs) has improved, we have been exploring how these models can reproduce the systematic approach of a human security researcher when identifying and demonstrating security vulnerabilities. We hope that in the future, this can close some of the blind spots of current automated vulnerability discovery approaches, and enable automated detection of "unfuzzable" vulnerabilities.

#googleprojectzero #EN #2024 #Offensive #Project-Naptime #LLM

·googleprojectzero.blogspot.com·Jun 21, 2024

Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models

Security Brief: TA547 Targets German Organizations with Rhadamanthys Stealer

What happened Proofpoint identified TA547 targeting German organizations with an email campaign delivering Rhadamanthys malware. This is the first time researchers observed TA547 use Rhadamanthys,...

#proofpoint #EN #2024 #LLM #chatgpt #analysis #TA547 #Rhadamanthys #Stealer

·proofpoint.com·Apr 17, 2024

Security Brief: TA547 Targets German Organizations with Rhadamanthys Stealer

Diving Deeper into AI Package Hallucinations

Lass Security's recent research on AI Package Hallucinations extends the attack technique to GPT-3.5-Turbo, GPT-4, Gemini Pro (Bard), and Coral (Cohere).

#lasso #EN #2024 #AI #Package #Hallucinations #GPT-4 #Bard #Cohere #analysis #LLM

·lasso.security·Mar 28, 2024

Diving Deeper into AI Package Hallucinations