Recent Jailbreaks Demonstrate Emerging Threat to DeepSeek
Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content. Evaluation of three jailbreaking techniques on DeepSeek shows risks of generating prohibited content.
EPFL: des failles de sécurité dans les modèles d'IA
Les modèles d'intelligence artificielle (IA) peuvent être manipulés malgré les mesures de protection existantes. Avec des attaques ciblées, des scientifiques lausannois ont pu amener ces systèmes à générer des contenus dangereux ou éthiquement douteux.
Exclusive: Chinese researchers develop AI model for military use on back of Meta's Llama
Papers show China reworked Llama model for military tool China's top PLA-linked Academy of Military Science involved Meta says PLA 'unauthorised' to use Llama model * Pentagon says it is monitoring competitors' AI capabilities
Data Exfiltration from Slack AI via indirect prompt injection
This vulnerability can allow attackers to steal anything a user puts in a private Slack channel by manipulating the language model used for content generation. This was responsibly disclosed to Slack (more details in Responsible Disclosure section at the end).
Project Naptime: Evaluating Offensive Security Capabilities of Large Language Models
At Project Zero, we constantly seek to expand the scope and effectiveness of our vulnerability research. Though much of our work still relies on traditional methods like manual source code audits and reverse engineering, we're always looking for new approaches. As the code comprehension and general reasoning ability of Large Language Models (LLMs) has improved, we have been exploring how these models can reproduce the systematic approach of a human security researcher when identifying and demonstrating security vulnerabilities. We hope that in the future, this can close some of the blind spots of current automated vulnerability discovery approaches, and enable automated detection of "unfuzzable" vulnerabilities.
Security Brief: TA547 Targets German Organizations with Rhadamanthys Stealer
What happened Proofpoint identified TA547 targeting German organizations with an email campaign delivering Rhadamanthys malware. This is the first time researchers observed TA547 use Rhadamanthys,...
Lass Security's recent research on AI Package Hallucinations extends the attack technique to GPT-3.5-Turbo, GPT-4, Gemini Pro (Bard), and Coral (Cohere).
Personal Information Exploit on OpenAI’s ChatGPT Raise Privacy Concerns
Last month, I received an alarming email from someone I did not know: Rui Zhu, a Ph.D. candidate at Indiana University Bloomington. Mr. Zhu had my email address, he explained, because GPT-3.5 Turbo, one of the latest and most robust large language models (L.L.M.) from OpenAI, had delivered it to him.
Earlier this week, the Republican National Committee released a video that it claims was “built entirely with AI imagery.” The content of the ad isn’t especially novel—a dystopian vision of America under a second term with President Joe Biden—but the deliberate emphasis on the technology used to create it stands out: It’s a “Daisy” moment for the 2020s.